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An Experimental Investigation of “Point” Job Evaluation 
Systems * 


James H. Myers 


Prudential Insurance Company 


Scientific investigation of “point’’ job evalu- 
ation systems dates back to Lawshe, who, dur- 
ing the middle 1940’s, undertook a series of 
statistical studies aimed at uncovering the 
basic factors operating in systems of this type. 
After five studies (5, 6, 8, 9, 10) involving 
four factor analyses, his results were charac- 
terized by one general factor which accounted 
for between 77 and 99 per cent of the variance 
in total job value and which correlated highly 
with the rating on nearly every job require- 
ment. He called this factor “Skill Demands.” 

Subsequent factor analyses by other in- 
vestigators (1, 3, 4, 11) have confirmed the 
fact that although a number of basic factors 
might emerge in a particular factor analysis, 
in most studies one out of the number would 
account for upwards of 80 per cent of the 
variance in total points, or job value. 

One additional study by Lawshe, Dudek, 
and Wilson (7), however, produced results 
which were quite different from those of all 
other studies. In this particular study, 40 
jobs were rated under two job evaluation sys- 
tems, one with 11 items, the other with four. 
Factor analysis of both systems produced five 
basic factors for each. In contrast to other 
studies, however, in the 11-item system, four 
of the five factors each accounted for between 
10% and 37% of the total point variance, 
while in the four-item system each of the five 
factors accounted for between 4% and 53% of 
the total point variance. 

In personal experience with job evaluation 
in industry, the author had observed that 

1 This work was done as partial fulfillment of re- 


quirements for a Ph.D. degree at the University of 
Southern California, 1956. 


evaluations within a given company might be 
subject to manipulation or “forcing” by evalu- 
ators in order to produce a desired total 
evaluation or pay range for a job. The less 
accurate the system in operation, the greater 
the need for forcing of job ratings. It was 
felt that differences in results between the 
particular study (7) mentioned above and 
other studies were due in some measure to the 
fact that evaluations in the former were not 
forced (since they were made by relatively 
disinterested evaluators in a number of dif- 
ferent companies who would have had no in- 
centive to force ratings), while those in other 
studies probably were forced. 

The present study was undertaken to de- 
termine the extent to which the forcing of job 
ratings could influence results to be obtained 
from statistical investigations of point job 
evaluation systems. 


Method 


In order to determine the effects forcing can have, 
it was necessary to create an experimental situation 
wherein forcing could be made to occur under con- 
trolled conditions, so that comparisons could be made 
between forced and unforced evaluations. Accord- 
ingly, a sample of jobs were first evaluated under 
conditions designed to greatly reduce or eliminate the 
possibility of forcing. Then the ratings of the same 
jobs were “forced” to predetermined standards, as de- 
scribed below. It was then possible to note differ- 
ences between forced and unforced evaluations in 
terms of factor structure and factor loadings, to de- 
termine the effects of forcing. 

This procedure had the further advantage of hold- 
ing constant the effects of “halo” in the job ratings 
(halo might be thought of as an unintentional bias, 
while forcing would be more in the nature of inten- 
tional). Both forced and unforced evaluations un- 
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doubtedly contained halo,? but under the conditions 
of the experiment, differences in the two sets of rat- 
ings should have been mainly due to the effects of 
forcing alone. 

Systems, jobs, and raters. Although a point job 
evaluation system was already in operation in the 
Prudential Insurance Company, where this study was 
done, a new evaluation system was developed for the 
study. The use of the system then in operation 
might have encouraged conscious or subconscious ad- 
justments of ratings in order to conform to existing 
official evaluations uf the jobs studied. This would 
have violated the basic design of the study. 

A new job evaluation system was developed, con- 
sisting of 17% job characteristics or requirements 
(e.g., mental requirements, experience requirements, 
physical demand, etc.). Five grades, from most to 
least, were written for each factor. 

The new system was applied to a sample of 82 
jobs in operation at the Prudential’s Western Home 
Office, Los Angeles, California. Jobs were selected 
so as to adequately represent the wide variety of 
work performed by nonraanagement jobs in this 
company. Three company employees, experienced in 
point job evaluation techniques, rated the 82 jobs on 
each of the 17 job requirements. Ratings were done 
under two conditions. 

First condition (unforced) ratings. Raters were in- 
structed to evaluate each job on each requirement 
Separately, without regard to any total or over-all 
evaluation which might result. Letters rather than 
numbers were used to designate the descriptive grades 
in each job requirement, in order to eliminate the 
possibility that ratings on the various requirements 
could be summed to produce a total score for each 
job. 

Job-to-job comparisons for each requirement were 
encouraged, but raters were cautioned against allow- 
ing the over-all value of the job (either known or in- 
ferred) to influence their evaluations on the separate 
requirements. Raters were asked to work inde- 
pendently and not to compare evaluations at any 
time. The resulting evaluations were assumed to be 
as free from forcing as possible, and were denoted 
as “first condition” evaluations. 


2It is the writer’s opinion that the effects of halo 
were great in the evaluations of this study. Raters 
were experienced in point job evaluation and seemed 
(to the writer) unable to retreat far from the concept 
of “over-all job value” in assigning unforced evalua- 
tions. If another study of this type were to be done, 
it is suggested that inexperienced raters be used, and 
yo they be trained to reduce halo as much as pos- 
sible. 

8 It was not necessary for purposes of this study 
to have as many as 17 requirements in the new sys- 
tem. However, a secondary purpose of the entire 
study was to determine by factor analytic techniques 
the basic factors actually operating in the evaluation 
of jobs in Prudential. In a situation of this type, 
the student of factor analysis knows that it is helpful 
to have a relatively large number of variables to assist 
in locating reference vectors in rotation and to help in 
interp eting factors which emerge. 
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Second condition (forced) ratings. Arbitrary 
weights for each job requirement were developed by 
the raters and applied to the above first condition 
evaluations. This produced a total point score for 
each job. Raters were then informed of the actual 
level (actual pay grade, as determined by the evalua- 
tion system already in operation at the time of the 
study) of each job, furnished with a total-point-to- 
level conversion scale (a scale whereby point values 
are transformed into job levels, or pay grades), and 
asked to “adjust” first condition evaluations, where 
necessary, so that the final total point score for each 
job would conform to the actual job level on the 
conversion scale. They were instructed to make 
whatever changes seemed “most reasonable” in in- 
dividual job requirement ratings, even though job- 
to-job comparisons might be distorted. 

Raters were not informed of the reasoning or pur- 
pose behind second condition evaluations. They were 
told that the adjustments were necessary in order to 
provide certain additional statistical data which were 
needed. The resulting “forced” ratings were denoted 
as “second condition” evaluations. 

Both first and second condition evaluations were 
checked for reliability, intercorrelated, and factor 
analyzed, using Thurstone’s complete centroid method 
(12). Rotations were orthogonal to the strict cri- 
terion of simple structure (to avoid subjectivity in 
rotation as much as possible). Effects of adjusting 
or forcing evaluations were noted in terms of 
changes in rotated factor structure and factor load 
ings from first to second condition evaluations. Of 
particular interest were changes in loadings of job 
level on the factors which emerged, since all adjust- 
ments were made in relation to this value. 


Results 


Factor structure. 


Five factors emerged 
under both first and second conditions, as 


shown in Tables 1 and 2. The three pre- 
dominant factors under the first condition 
corresponded closely with the three predomi- 
nant factors under the second condition. The 
remaining two factors were somewhat differ- 
ent, possibly enough to alter factor interpreta- 
tions. However, neither factor was sufficiently 
well defined to permit any definite conclusions. 

From first condition loadings, the factors 
were designated as follows: 

(A) Over-all Value: High loadings on 
nearly all job requirements. High correlation 
with total points, or job level. 

(B) Supervision: High loadings on require- 
ments involving direction of subordinates. 

(C) Physical Components: High loadings 
on requirements dealing with physical demand 
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Table 1 


Rotated Factor Loacings 


. Mentai Requirements 
Frequency of Decisions 
. Difficulty of Judgment 
. Constant Attention to Details 
Education 
Experience 
. Effect of Inaccurate Work 
Review on Work 
. Persuasion 
10. Importance of Contacts to Company 
11. Frequency of Contacts 
12. Variety in Work 
13. Confidential Nature of Work 
14. Working Conditions 
15. Physical Demand 
16. Number Supervised 
17. Difficulty of Supervision Given 
18. Job Level 


Note.— Decimal points have been omitted 


and working conditions. High loadings on 
education requirement also, but this require- 
ment would have been split off if rotational 
criteria had included psychological meaning- 
fulness. 


First Condition 


(D) Independent Action: Freedom or lati- 
tude in approaching job duties. Characterizes 
work which may be done independently, in the 
sense that a supervisor has latitude in how he 
handles group, etc. 


Table 


Rotated Factor Loadings 


. Mental Requirements 


. Frequency of Decisions 
. Difficulty of Judgment 
. Constant Attention to Details 
. Education 
. Experience 
. Effect of Inaccurate Work 
. Review on Work 
. Persuasion 
10. Importance of Contacts to Company 
11. Frequency of Contacts 
12. Variety in Work. — 
13. Confidential Nature of Work 
14. Working Conditions 
15. Physical Demand 
16. Number Supervised 
17. Difficulty of Supervision Given 
18. Job Level 


Note.—Decimal points have been omitted. 


Second Condition 


B G 
—02 

02 

—06 

00 

52 
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Table 3 


Job Requirements Showing Greatest Changes in Factor Loadings from First to 


Second Conditions for Each Factor 





Requirement Showing 
Greatest Change 


Factor 





Over-all Value Education 


Supervision 
Physical Components 
Independent Action 


Confidential Nature Education 


(E) Confidential Nature: Confidential na- 
ture of material normally handled in perform- 
ing job duties. 


Contrary to expectations, forced as well as 
unforced evaluations yielded factor structures 
which were similar to those found in the 
majority of previous studies; that is, one 
principal factor (Over-all Value) showed high 
loadings on nearly all job requirements and 
explained nearly all the variance of job level. 

Factor loadings. In spite of the over-all 
similarity between first and second condition 
factor structures, factor loadings of job re- 
quirements showed some fairly substantial 
differences. These differences (between load- 
ings of a job requirement on the same factor 
from first to second conditions) were greatest 
in the less well-defined factors which emerged. 

Table 3 indicates the job requirements 
which showed the greatest changes in loadings 
on each factor from first to second conditions 
(from Tables 1 and 2). 

Thus, for the Over-all Value factor, the 
“Education” requirement showed the greatest 
change in factor loadings from first to second 
conditions, increasing from .30 to .43, respec- 
tively. 

Forcing also changed the job level vari- 
ance explained by some or the factors which 
emerged. It can be seen from Tables 1 and 2 
that forcing ratings to conform to job level 
increased from .93 to .99 the correlation be- 
tween job level and Over-all Value, the prin- 
cipal factor which emerged. This had the 
effect of increasing the job level variance ex- 
plained by this factor from 86% under the 
first condition to 98% under the second. This 


Physical Demand 
Diff. of Judg. 
No. Supervised 


Factor Loadings 


2nd Cond 


ist Cond. 


30 43 
— 32 -.13 
08 — 06 
49 26 
13 —.28 


increase was offset by decreases in job level 
variance explained by two other factors, Physi- 
cal Components and Independent Action. Per- 
centages of job level variance explained by 
the two remaining factors remained relatively 
constant. 


Discussion 


It should be noted that the changes from 
first to second conditions found in this study 
were not the same as the types of change 
which occur in the more usual experimental 
situation. In the latter, separate measure- 
ments are taken before and after the intro- 
duction of instructions or conditions, the ef- 
fects of which are being measured. In the 
present study, the measurements themselves 
which were obtained under the first condition 
were changed. 

Differences between forcing and “halo” are 
important to consider in this study. It is be- 
lieved that forcing represents an influence 
over and above that of halo. In the latter, it 
might be assumed that the rater is uninten- 
tionally influenced in his evaluations by an 
over-all impression of job worth. The situa- 
tion would be structured in such a way that 
he would have no ulterior motive for produc- 
ing a particular final or over-all rating, and he 
would be likely to do his best to rate ac- 
curately. 

In the case of forcing, however, there would 
be a more conscious desire to produce a par- 
ticular evaluation for a reason. For example, 
in evaluating jobs the proper organizational 
“fit” for a job could be achieved by forcing, if 
initial evaluation attempts did not place a 
job in “proper” relation to others. 
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First condition (unforced) evaluations were 
undoubtedly influenced to some extent by 
halo, and this influence would have been 
carried directly over into second condition 
evaluations. However, only second condition 
evaluations should have been influenced by 
conscious forcing of the type described above. 
The change from first to second conditions 
should have resulted solely from forcing, and 
it was this phenomenon that was under con- 
sideration in this study. 

It is impossible to tell from this study alone 
the maximum effects that forcing can produce 
on a job evaluation system. It is presumed 
that the effects are at least as great, if not 
greater, in many job evaluation systems in 
actual operation. This study suggests that an 
investigator about to undertake a statistical 
study of job evaluation systems should be 
aware that forcing may have influenced evalu- 
ations already available in a plant or office, 
and he should plan his investigation accord- 
ingly. 

Summary 


Eighty-two jobs were evaluated by three 
raters on 17 job requirements or character- 
istics. Evaluations were done under two con- 


ditions: unforced and forced. Findings were: 
1. Five factors emerged in both unforced 


and forced evaluations. The three predomi- 
nant factors in each were similar, the remain- 
ing two somewhat changed. 

2. Both forced and unforced evaluations 
yielded factor structures similar to those in 


most previous studies; i.e., one over-all factor 


explaining most of the variance of job level. 
3. Forcing had the effect of increasing the 
job level variance explained by the principal 
factor from 86% in unforced ratings to 98% 
in forced ratings. Factor loadings of some 
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individual job requirements were also rather 
markedly affected by forcing. 


Received March 17, 1958. 


References 


. Ash, P. A statistical analysis of the Navy’s 
method of position evaluation. Publ. Personn. 
Rev., 1950, 11, 130-138. 

. Ebel, R. L. Estimation of the reliability of rat- 
ings. Psychometrika, 1951, 16, 407-424. 

3. Grant, D. L. An analysis of a point rating job 
evaluation plan. J. appl. Psychol., 1951, 35, 
236-240. 

Howard, A. H., & Schultz, H. G. A factor analy- 
sis of a salary job evaluation plan. J. appl. 
Psychol., 1952, 36, 243-246. 

. Lawshe, C. H., Jr. Studies in job evaluation: II 
The adequacy of abbreviated point ratings for 
hourly paid jobs in three industrial plants. J 
appl. Psychol., 1945, 29, 177-184 

. Lawshe, C. H., Jr., & Alessi, S. L 
evaluation. IV. An analysis of another point 
rating scale for hourly paid jobs and the 
adequacy of an abbreviated scale. J. appl 
Psychol., 1946, 30, 310-319 

. Lawshe, C. H., Jr., Dudek, E. E., & Wilson, R. F 
Studies in job evaluation. 7. A factor analysis 
of two point rating methods of job evaluation 
J. appl. Psychol., 1948, 32, 118-129. 

. Lawshe, C. H., Jr., & Maleski, A. A. Studies in 
job evaluation. 3. An analysis of point ratings 
for salary-paid jobs in an industrial plant. J. 
appl. Psychol., 1946, 30, 117-128 

. Lawshe, C. H., Jr., & Satter, G. A. Studies in 
job evaluation. I. Factor analysis of point 
ratings for hourly-paid jobs in three industrial 
plants. J. appl. Psychol., 1944, 28, 189-198. 

. Lawshe, C. H., Jr., & Wilson, R. F. Studies in 
job evaluation. 5. Analysis of the factor com- 
parison system as it functions in a paper mill 
J. appl. Psychol., 1946, 30, 426-434. 

. Rogers, R. C. Analysis of two point-rating job 
evaluation plans. J. appl. Psychol., 1946, 30, 
579-585. 

. Thurstone, L. L. Multiple-factor analysis; a de- 
velopment and expansion of The Vectors of 
Mind. Chicago: Univer. Chicago Press, 1947 


Studies in job 





Journal of Applied Psychology 
Vol. 42, No. 6, 1958 


Judgments of Speed on the Open Highway ' 


Abram M. Barch” 


Michigan State University 


Certain adaptation effects are often reported 
with respect to the perception of the rate of 
speed at which one is moving while riding in 
an automobile. After riding at a relatively 
constant speed for a period of time, this speed 
does not appear as fast as it did at the begin- 
ning. Furthermore, travel at a rate of speed 
below this previous speed may seem extremely 
slow. 

Despite the common agreement on the ex- 
istence of speed adaptation effects and the 
long-standing interest of psychologists in the 
perception of motion, the conditions contribut- 
ing to such adaptation or the behavioral ef- 
fects of such alteration in phenomenal velocity 
have not been studied. Even the question of 
how reliably speeds of an automobile can be 
judged has received little attention (2, 5). 

Speed adaptation has been suggested as a 
contributing factor in traffic accidents, espe- 
cially those at the end of long tangent sections 
of roadway (3, p. 24). This notion has plau- 
sibility if it can be assumed that the effect of 
speed adaptation is to cause drivers to main- 
tain higher levels of speed than they would 
otherwise in situations where a lower speed is 
conducive to safety (e.g., curves, turn-off 
lanes, signalled intersections on rural high- 
ways). 

The present study was an exploratory one 
with two objectives: (a) to determine the ac- 
curacy with which judgments of speed could 
be made by the driver of a passenger car while 
decelerating; and (6) to determine the influ- 
ence of increasing amounts of exposure to a 
given speed on these judgments. 


1 This study was conducted under a joint appoint- 
ment with the Department of Psychology and the 
Highway Traffic Safety Center of Michigan State 
University with funds and facilities provided by the 
Center. 

2 Acknowledgment is made of the assistance of 
Wayne Chubb, Peter Hemingway, John Nangle, and 
Thomas Trabasso in carrying out the experimenta- 
tion and data analysis. 
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Experiment I 
Method 


Subjects. The Ss were 44 male volunteers enrolled 
in driver education courses at Michigan State Uni- 
versity, Summer, 1957. All were working toward a 
degree in education, but not necessarily in driver edu- 
cation. The results of four Ss were not used on the 
basis of an a priori policy of omitting the first S 
run by each E. Five more Ss were lost due to fail- 
ure to follow instructions, mechanical failure, or rain 
or wet highway during their run. The Ss ranged in 
age from 22 to 52, in years of driving at least 1000 
m.p.y. from 2 to 28, in average yearly mileage during 
the past two years from 4000 to 35,000 m.p.y., and 
in years of driver education experience from 0 to 8 
years. Median age was 28, median years of 1000 
m.p.y. was between 11 and 12, median yearly mile- 
age during past two years between 12,000 and 13,000. 
Eighteen had some driver education teaching experi- 
ence. 

Apparatus. The car used in the study was a 1956 
Chevrolet station wagon, in University service for 
one year (22,000 miles), and equipped with auto- 
matic transmission. The operation of the regular 
speedometer was modified so that it could be made 
to read zero, regardless of actual car speed, by the 
positioning of a mechanical switch located under the 
dash at the extreme right. The appearance of the 
regular speedometer and of the dash on the driver’s 
side remained unaltered. An auxiliary speedometer 
was mounted on the right front dash—easily visible 
from the right front seat but masked from the 
driver’s view by a light cardboard shield fitted about 
this speedometer. 

Two modifications were added for safety: (a) a 
yellow and black cloth sign, 3X14 ft., reading 
“CAUTION. Test car” was taped to the lower rear 
side of the car; (b) a flasher was inserted into the 
brake lights circuit so that Z, by means of a manual 
switch, could cause the brake lights to flash repeat- 
edly during the test decelerations. 

Test areas. The main test area was a 10-mile con- 
crete four-lane divided section of U.S. 127 running 
from the edge of Holt, Michigan toward Jackson, 
Michigan and located 10 miles from Michigan State 
University. This section had only one signalled in- 
tersection (a yellow blinker) but did not have con- 
trolled access. At several points, mostly close to 
Holt, extra lanes were provided for turning move- 
ments. 

Eleven locations or “stations” were selected as 
deceleration points. Six of these stations (I-1, 2, 3, 
4, 5, and 6) were on the side of the highway leaving 
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Holt and five (7, 8, 9, 10, and II-11) were on the 
side approaching Holt. The stations were chosen so 
as to have as level a roadway as possible and to be 
as far as possible from the end of curves and the 
crest of hills (with one exception, Station 6, which 
was on a high plateau just past the crest of a hill). 
Because of these limitations only two sets of sjations 
were directly opposite each other: (a) I-1 and II-11; 
(b) 5 and 7. 

Two practice sets of speed judgments were ob- 
tained on a blacktop secondary road about two miles 
from the University. 

Procedure. The major series of speed judgments 
essentially required S to drive for 20 miles at 50 
mph, slowing briefly at deceleration stations during 
the drive in order to make speed estimates. 

From the starting point (at Holt) S accelerated to 
50 mph, held that speed for about 5 sec., decelerated 
indicating when he thought the car was at 40 and 
at 30 mph, and then used accelerator action to main- 
tain his estimated 30 mph. After S had maintained 
his estimated 30 mph for about 5 sec., he was told 
to accelerate to 50 mph and continue driving at that 
speed until further notice. He decelerated at five 
more stations on the outward run, crossed to the 
other side of the highway and decelerated five more 
times on the inbound run. The procedure for mak- 
ing speed judgments was the same at each decelera- 
tion station. The distance in miles to each of the 11 
deceleration stations was, respectively, 0.6, 1.4, 2.6, 
4.9, 8.5, 9.6, 11.4, 14.5, 17.5, 18.3, 20.0. The set of 
judgments obtained in this series will be called Judg- 
ment Sets 1-11. 

Just prior to reaching each station, Z disconnected 
the regular speedometer and instructed S$ to remove 
his foot from the accelerator. Brakes were never ap- 
plied during these decelerations. The regular speed- 
ometer remained inoperative after each deceleration 
until S had again accelerated to 50 mph. The regu- 
lar speedometer was operative all the time S was 
supposed to be driving at 50 mph. However, E as- 
sisted S in maintaining 50 mph by requests to in- 
crease or decrease speed. The tolerance range was 
48-52 mph. Additional requests were made near de- 
celeration stations to keep the car within the 49-S1 
mph range. 

Prior studies (2, 5) had stressed the unreliability 
of speed judgments. On the assumption that prac- 
tice in the experimental procedure would improve 
the reliability of the judgments, four sets of speed 
judgments were obtained prior to the main series of 
judgments. 

The procedural sequence for the study was as fol- 
lows: 

1. Instructions were given during drive by E to 
the practice site. These stressed an interest in how 
people perceive and judge speed while driving. Ad- 
aptation was not mentioned in any way. The safety 
precautions were described, and S assured that he 
could refuse to follow any instruction that he felt to 
be unsafe. 


.. (described above). 
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2. For the first practice set (Set A) S accelerated 
the car to 35 mph, decelerated after 5 sec. at this 
speed, made judgments of 25 and 15 mph while de- 
celerating, and then held his estimated 15 mph for 
about 5 sec. For the second practice set (Set B) S$ 
repeated the procedure for Set A but held the speed 
of 35 mph for 1.1 mile. Instructions to increase or 
decrease speed were given whenever the speed devi- 
ated from the 34-36 mile range. 

3. The car was then driven by E to the main test 
area over blacktop secondary roads at a speed of 
40-45 mph. 

4. Upon arrival at the main test area, the car was 
stopped and background information collected. A 
minimum of 4 min. of no motion was required—the 
median no-motion time was 4 min. 12 sec. 

5. Judgment Set I was then obtained at the same 
location (Station I-1) and through the same pro- 
cedure as that for Judgment Set 1 of the main series 
After completing Judgment Set 1, 
S crossed to the other side of the highway, acceler- 
ated to 50 mph driving toward Holt, held this speed 
for about 5 sec., decelerated at Station Il-11 making 
judgments of 40 and 30 mph, and held his estimated 
30 mph for about 5 sec. 

6. The main series of judgments was then carried 
out. 

Judgment Sets I, II, and 6 were obtained in the 
left-hand lane since a cross-over to the other side of 
the highway occurred shortly after the completion of 
each of these judgment sets. The Ss were instructed 
about 0.2 mile from Station 6 to pull into the left 
lane and remain in that lane up to the cross-over 
point. All other speed judgments were made in the 
right-hand lane unless maintenance of the 50 mph 
speed required moving into the left lane to avoid 
slow moving traffic. 

Actual car speed was observed visually and re- 
corded to the nearest mile, with an occasional re- 
cording to the nearest half-mile, for all but 9 Ss. 
For these Ss, a motion picture camera mounted be- 
hind E and equipped for single frame exposure was 
used to record car speed at the moment of the de- 
celeration request and at each judgment. Similar re- 
sults were obtained by the two methods of record- 
ing and the data was combined. 

Experimental design. Judgment Sets I, II, and 1 
gave three measures of minimal exposure to 50 mph 
(about 5 sec.) while Judgment Sets 2 through 11 
were intended to demonstrate the effect of increasing 
amounts of exposure to this speed. This latter com- 
parison assumed that momentary decelerations from 
50 mph would not wholly negate any adaptation 
process. 

The limited experimentation and observation on 
the perception of real movements suggests that the 
boundary of the field in which motion is perceived 
can strongly influence judgments of velocity (e.g., 
1). Such a consideration means that the roadside 
features as well as the roadway itself should be as 
similar as possible at all stations, but especially in 
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Table 1 
Speed Judgments at the Main Test Area 











Experiment I 
50-30 Diff. 


Experiment IT 








Mean Judgments 
409 mph 30 mph 


I-1 () 42.3 
(1) 41.8 

41.4 

42.0 

41.5 

41.5 

40.3 

42.2 

42.1 

41.8 

10 41.4 
I-11 (ID 41.6 
(11) 40.4 


Mean Judgments 50-30 Diff. 








Stations Mean 30 mph Mean ‘SD. 








34.5 15.6 2.7 
33.8 16.1 2.6 


15.5 
15.4 
15.3 
15.5 
15.4 
15.7 
18.3 
15.1 
15.5 
15.1 
15.7 
15.7 
16.8 


33.0 
33.5 





the area used for demonstrating the effect of mini- 
mal and maximal exposure to the “adapting speed” 
(Stations I-1 and Il-11). It was for this reason 
that a course that doubled back on itself rather than 
a continuous 20-mile strip of highway was used. 
Judgment Sets I and II, in addition to their func- 
tion as practice sets, furnished a check on the equiva- 
lence of the two sides of the highway close to Holt 
in terms of their influence on speed judgments after 
minimal exposure to 50 mpb. 


Results 


Despite cautions, speeds immediately prior 
to deceleration at the 50 mph rate varied as 
much as 5 mph between different persons and 
different judgment sets. Since it was E’s re- 
sponsibility to maintain a car speed of 50 (or 
35) mph, there was no need for S to remain 
continuously aware of the car’s exact speed. 
It is reasonable to assume that S used the 
speed immediately prior to deceleration as his 
level for the 50 (or 35) mph and based his 
judgments accordingly. Therefore, the dif- 
ference for each individual in each judgment 
set between the actual speed just prior to de- 
celeration and the speed judged to be 30 (or 
15) mph was used as our measure of esti- 
mated speed instead of the judged 30 (or 15) 
mph speed itself. 

Similar results followed from the use of 
either score, but the difference scores had a 
higher reliability from one judgment set to 


another. The speeds obtained by Ss in at- 
tempting to maintain 30 (or 15) mph by ac- 
celerator action were not used in the analysis 
because it was found that Z was often uncer- 
tain as to whether S had reached a stable 
held speed. 

The results obtained for Practice Sets A and 
B are similar to those obtained at the main 
test area and are omitted to conserve space. 

Table 1 presents the mean car speed judged 
equivalent to 40 and 30 mph, the mean 50-30 
difference score, and the standard deviation 
of the difference scores for the judgment sets 
obtained in the main test area. The presence 
of speed adaptation, under the meaning of 
the term as used here, would be shown by a 
significant decrease in the mean 50-30 differ- 
ence score from early to later judgments— 
essentially, a significant increase in the car 
speed reported equivalent to 30 mph as the 
amount of exposure to 50 mph is increased. 

Table 2 presents the results of an analysis 
of variance of the 50-30 difference scores for 
Judgment Sets I, II, and 1 through 11. (Five 
missing observations were estimated as sug- 
gested by Snedecor [6, pp. 310-313].) The 
sequential Q technique tabled by Snedecor 
(6, p. 251) was used to make comparisons 
between individual judgment sets. The Judg- 
ment Sets X Subjects interaction was used for 
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estimating the standard error of the difference 
for each comparison. 

The difference score for Judgment Set 6 was 
significantly darger (.05 level) than all others; 
the difference score for Judgment Set 11 was 
significantly larger (.05 level) than those of 
Judgment Sets 7,9, and 10. A demonstration 
of speed adaptation would require the differ- 
ence scores for the later judgments to be sig- 
nificantly smaller than those of the earlier 
judgments. 

Underestimation of actual speed was ob- 
tained throughout the experiment. Table 1 
clearly shows the underestimation of actual 
speed for both the 40 and 30 mph judgments. 
This underestimation could be the result of a 
general tendency to underestimate speeds less 
than 50 mph. It is more likely due to a more 
or less instantaneous contrast effect resulting 
from the previously experienced higher speed. 
(A reduced degree of underestimation or even 
an overestimation might be expected from 
this last hypothesis if Ss were to make speed 
judgments while accelerating instead of de- 
celerating.) At any rate, such underestima- 
tion provides no evidence for a phenomenon 
of speed adaptation since it did not increase 
with increasing exposure to the “adapting” 
speed. 

Of some interest is the fact that the mean 
actual speed judged equivalent to 40 mph was 
essentially a bisection of the gap in mph be- 
tween the speed just prior to deceleration and 
the speed judged equivalent to 30 mph. The 
ratio of the mean 50-40 difference score to 
the mean 50-30 difference score for each judg- 
ment set varied from .46 to .53. 

Table 3 lists the correlations between the 


Table 2 
Analysis of Variance of 50-30 Difference 
Scores of Experiment I 


df Square 


22.43 
136.79 
5.36 


Judgment sets 12 
Subjects 34 


Judgment sets X subjects 403* 
Total 449 


* Significant at .01 level. : 
* Five df lost due to missing observations. 


Table 3 


Correlation of Difference Scores from 
Various Judgment Sets 
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difference scores for various judgment sets. 
The correlations, in general, are somewhat 
higher than might be expected in view of the 
nature of the task and the relative unfamili- 
arity of Ss with the car and the road. Corre- 
lational studies of trial-by-trial changes in the 
learning of perceptual motor skills have found 
that increasing practice results in a decrease 
in relationship between scores made on early 
trials and those made on later ones and an 
increase in the relationship between adjacent 
trials (4). Evidence for both trends was 
found here. 

Striking individual differences between Ss 
were found with the average 50—30 difference 
score for an individual ranging from 9.5 to 
24.2 mph. However, no significant relation- 
ship was found between mean difference scores 
and any of the available biographical charac- 
teristics such as age, driving mileage, years 
of driving, and driver education experience. 


Experiment II 
Method 


Subjects. The Ss were 13 male volunteers enrolled 
in the same driver education courses as those of Ex- 
periment I. Median age was 30, median years of 
1000 m.p.y. was 14, and median yearly mileage dur- 
ing past two years was 11,000. Nine had some driver 
education teaching experience. 

Procedure. The procedure for Experiment II was 
the same as that for Experiment I with one major 
modification. It was felt that the absence of speed 
adaptation effects noted in Experiment I might be 
due to (a) the use of too short a period of con- 
tinuous speed; (b) too many judgments; (c) both 
the brevity of the period of continuous speed and the 
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presence of too many judgments. . Therefore, Judg- 
ment Sets 2, 3, 4, 5, 8, 9, and 10 were omitted. 

After making a set of judgments at Station I-1 and 
returning to 50 mph the Ss drove continuously at 
that speed for a median time of 8 min. 35 sec. before 
making a set of judgments at Station 5. A set of 
judgments were made at Station 7 and at II-11, with 
a median time interval of 8 min. 50 sec. at 50 mph 
between these locations. 

The analysis of variance of the 50-30 difference 
scores for the six judgment sets failed to reach sig- 
nificance at the .05 level (F = 1.24 for 5 and 60 df). 
No evidence for speed adaptation was obtained de- 
spite the decrease in number of judgments and the 
longer period of constant speed. 

Inspection of Tables 2 and 3 indicates that quite 
similar results were obtained from the two experi- 
ments with the minor exception of the “end” effect 
found for Judgment Set 11 in the first experiment 
only. 


Discussion 
The two studies agreed in finding no evi- 
dence for speed adaptation in the speed judg- 
ments made by drivers while decelerating un- 
der the conditions of the studies. The first 


experiment found some evidence for the op- 
posite influence on speed judgments when the 
Ss were approaching a point where they were 
going to come to a stop. However, this effect 
was not noted for Judgment Sets I and II 
where the Ss also knew a stop would follow. 


The effect for Judgment 11 was also not ob- 
tained in the second experiment. 

There are a number of ways in which the 
procedure might have obscured the presence 
of adaptation effects. For one thing, S had 
ample opportunity to adjust his phenomenal 
impression of 50 mph as the experiment pro- 
ceeded and the adjustment of this anchoring 
point might have influenced his whole scale. 
However, a small pilot study conducted prior 
to the experiments reported here had Ss drive 
for relatively short time intervals but with the 
speedometer masked at al! times.’ Ss were 

8 Unpublished study conducted by Don Trumbo 


and Peter Hemingway with equipment furnished by 
the Highway Traffic Safety Center. 
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required to attain and maintain various speeds 
without specific knowledge of any speed. No 
evidence for speed adaptation was found. 

Other tenable hypotheses are that a speed 
adaptation requires longer periods of constant 
speed, constant speed higher than the rates 
used here, or both longer periods and higher 
speeds. It is also possible that speed adapta- 
tion may be far more characteristic of the 
passenger rather than the driver in that the 
driver in his coping with various road situa- 
tions may obtain subsidiary nonconstant in- 
formation about car speed. 


Summary 


Male adult drivers, while decelerating on 
the open highway, were ‘required to make 
judgments about the speed of the passenger 
car they were driving after varying amounts 
of exposure to a constant speed of 35 or 50 
mph. 

The accuracy and consistency of the judg- 
ments and the influence of varying amounts 
of exposure on these speed judgments (speed 
adaptation) were studied. Such speed judg- 
ments were found to be quite reliable and ap- 
parently independent of speed adaptation. 


Received December 18, 1957. 
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Several studies (1, 2, 4, 6, 7) have re- 
ported the results of having groups of job ap- 
plicants or industrial workers rank lists of 
verbal statements of job incentives. One 
major criticism of such studies is that the 
verbal job incentives used are not selected on 
the basis of any theoretical framework of 
hypothesized dimensions of job incentives, but 
each incentive is arbitrarily assumed to meas- 
ure a dimension that is independent of the 
other incentive-measured dimensions in the 
sample. This absence of a unifying theory 
also means that variations in the wording of 
similar incentives used in different studies 
weaken any interstudy comparisons. If it 
can be shown that several different verbal in- 
centive statements are measuring the same 
fundamental dimensions, then an approach 
can be made toward developing a taxonomy 
of independent job incentive dimensions that 
would aid in unifying research. At the em- 
pirical level, both the methods of factor analy. 
sis and of content analysis provide techniques 
by which hypothesized dimensions can be 
identified and explored. 

Our interest is in isolating such dimensions 
of job incentives among groups of under- 
graduate college students as an unexplored 
research area related to occupational choice 
among students and in the hope that a dimen- 
sional taxonomy developed from these more 
easily accessible Ss may eventually be sug- 
gestive of similar studies with broader samples 
of industrial personnel. As a basis for initia! 
hypotheses about such dimensions, we as- 
sumed that recent developments in personality 
theory would offer some suggestions. The 
work of McClelland et al. (5} indicates that 
much of the academic performance of college 
Ss can be related to motivational dimensions 
such as “need for achievement” and “fear of 
failure.” Many of the typical job incentive 
statements used in previous research can be 
ordered along a dimension that uses these two 
McClelland constructs to define opposite poles 


of this dimension. Opportunity for advance- 
ment, promotion for initiative, and similar in- 
centives would lie along the “need achieve- 
ment” segment of this dimension, while job 
security, benefits, and working conditions 
would fall at the “fear of failure” end. Con- 
sequently, a “need achievement vs. fear of 
failure” dimension seems a reasonable begin- 
ning hypothesis. 

An obvious second dimension of job incen- 
tives for college Ss is a “social service” need 
to help and assist other people. In selecting 
an occupational goal some college Ss appear 
to be more concerned with whether they will 
have <n opportunity on the job to satisfy this 
need than they are with other more traditional 
job incentives. In an increasingly “other- 
directed” industrial society this dimension 
may become highly important in classifying 
job incentives. 

The study reported below is the first stage 
in a projected series aimed at developing a 
taxonomy of job incentives. We decided to 
first test the adequacy of the proposed meth- 
odology within a deliberately limited area of 
job incentives as an exploratory study of one 
hypothesized dimension. It was hoped that 
an application of the method would provide 
additional hypotheses for later developmental 
research studies. 


Procedure 


Incentives. A list of the verbal descriptions of job 
incentives used in previous studies (1, 2, 4, 6, 7) 
was prepared and several incentives constructed by 
the present authors were added. We decided to in 
clude on our list only those that might, on an a 
priori basis, be expected to measure the hypothesized 
“need achievement vs. fear of failure” dimension 
and would also be applicable to college student Ss 
Incentives apparently measuring other dimensions, 
particularly the “social service” variable noted above, 
were excluded to provide as homogeneous a list of 
incentives as possible and to reduce the factorial 
complexity of the interrelationships among the in- 
centives. The following eight incentive statements 
were selected: 
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. Opportunity to learn new skills 

. Friendly fellow workers 

. Freedom to assume responsibility 

. Good job security 

. Good prospects for advancement 

. Full insurance and retirement benefits 

. Recognition from supervisors for initiative 
. Good salary 


Onaunt WN 


Incentives 1, 3, 5, and 7 were included to represent 


the “need achievement” pole of the hypothesized , 


dimension, while Incentives 2, 4, 6 and 8 were selected 
to define the “fear of failure” pole. A form was 
prepared to collect Ss’ rankings of these eight in- 
centives. The form requested the S to (a) record 
his name, age, sex, curriculum group or school, and 
major subject, (b) write a brief description of the 
specific job or occupation toward which his college 
preparation was oriented, (c) rank the eight incen- 
tives in terms of how important each incentive will 
be in selecting the job the S$ had described above 
with the most important incentive being ranked 
“one” and the least important incentive ranked 
“eight,” and (d) if the S felt that a very important 
(to him) incentive had been omitted from the list, 
he was asked to write a brief description of the in- 
centive at the bottom of the form. It was hoped 
that this last procedural task would permit Ss to 
volunteer omitted incentives such as the “social 
service” dimension and, through a content analysis 
of the volunteered statements, provide data for 
hypotheses about other important dimensions. 

Subjects. The ranking form was distributed to 
267 student Ss (174 men and 93 women) in 10 sec- 
tions of an introductory psychology course. All Ss 
filled in the form in class as requested by their in- 
structor who had announced that the department of 
psychology was making a survey of the job incen- 
tives important for college students. Approximately 
one third of the Ss were pre-education students, 
another third were majoring in the humanities, social 
sciences, or natural sciences, while the remaining Ss 
were divided between engineering and business ad- 
ministration students. 


Results 


The average rank-difference correlation 
among the 267 ranking Ss (each S correlated 
with every other S and these 35,511 intercor- 
relations averaged) was .20 which is signifi- 
cant at the .01 level of confidence. 

A sample of 100 ranking Ss was randomly 
drawn from the total group of 267 Ss without 
regard to any demographic variables such as 
sex, age, or curriculum grouping. Each S’s 
ranking of the eight incentives was dichoto- 
mized by scoring the incentives ranked from 
1 to 4 as “one” and the incentives ranked 5 
through 8 as “zero.” Tetrachoric correlation 


coefficients were then computed among the 
eight incentives. As would be expected, the 
resulting matrix of 28 intercorrelations tended 
to be negative with the mean correlation be- 
ing — .21 and the individual coefficients rang- 
ing from .24 to — .53. This matrix was fac- 
tor analyzed by the centroid method and three 
factors were extracted. The analysis was rep- 
licated twice to stabilize the communality esti- 
mates. Inspection of the residual correlations 
indicated the absence of a fourth factor since 
the median absolute residual after extracting 
the third orthogonal factor was .12. The 
three factors were rotated to achieve orthogo- 
nal simple structure and, as far as possible, 
positive manifold. Because of the generally 
negative intercorrelations among the incen- 
tives, the factors tended to be bipolar. 

The results of this factor analysis (N = 
100) can be found in Table 1 along with the 
mean rank for each incentive (N = 267). 


The three factors accounted for 56 per cent 
of the interincentive variance with 10 of the 
24 loadings having absolute values of .40 or 
above and 10 of the loadings falling at .20 or 
below. 

Factor A appears to be the “need achieve- 
ment vs. fear of failure” variable originally 


hypothesized in selecting the incentives. In- 
centives 1, 3, 5, and 7 (“need achievement’’) 
had a median loading on Factor A of — .25, 
while Incentives 2, 4, 6, and 8 (‘fear of fail- 


Table 1 
Mean Ranks (NV = 267) and Factor Loadings (Decimal 
Points Omitted and N = 100) for 
Eight Job Incentives 








Rotated Factors 


Mean 
Rank A Cc 


Incentives 





. New skills 44 —69 . 07 
. Fellow workers 4.8 27 07 
. Responsibility 3.8 — 53 24 
. Job security 3.6 32 84 
. Advancement 3.4 14 —59 
. Job benefits 6.7 64 08 
. Initiative 5.4 03 —54 
. Salary 3.8 20 


CONAN Fe wWH 


Percentage of 
Total Variance 18 
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ure”) showed a median loading of .30 with 
no overlap between these two subgroups of 
incentives in Factor A loadings. Incentives 4 
and 6 defined the “fear of failure’ pole while 
Incentives 1 and 3 best measured the “need 
achievement” end of the Factor A dimension. 

Incentives 2 and 3 fell at one end of the 
Factor B dimension while Incentives 5 and 8 
measured the opposite factor extreme. As an 
a posteriori hypothesis we might characterize 
this dimension an “interest in the job itself 
vs. the job as an opportunity for acquiring 
status” factor from an inspection of the in- 
centives defining the factor poles. We would 
expect Ss clustering at the first end of this 
factor to be concerned with whether a given 
job would be personally important to them 
and with the personal characteristics of the 
people with whom they would be working. 
The Ss at the other end of this dimension 
would regard the job as a spring-board for 
upward mobility in terms of income, author- 
ity, and job title. However, since we did 
not predict in advance the appearance of such 
a factor we can offer this interpretation only 
as an hypothesis for future research. 

Factor C was a bipolar with Incentive 4 de- 
fining the positive end of this dimension and 
Incentives 5 and 7 falling at the negative pole. 
Factor C is difficult to “name,” but we might 
hazard the guess that it concerns the attitude 
of the S toward his supervisors or employers. 
The S at the positive pole of Factor C wants 
to be autonomous and secure in his job, while 
the S at the other pole needs recognition and 
advancement from his immediate supervisor. 
Tentatively, we might label this a “job au- 
tonomy vs. supervisor dependent” factor. 

For later research it would be desirable to 
have a single score measure of these three 
ranking factors even though such a score 
would be somewhat crude and of low validity 
at the present stage of research. By examin- 
ing the ranking factor loadings in Table 1, it 
can be seen that the most valid and factorily 
pure measure of Factor A would be obtained 
by computing for each S the difference be- 
tween his rankings of Incentives 6 and 1. 
Similarly a factor “score” for Factor B can 
best be obtained by subtracting his ranking 
of Incentive 2 from his ranking of Incentive 
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8 and a Factor C score found by finding the 
ranking of Incentive 7 minus his ranking of 
Incentive 4. Positive factor scores computed 
in the above manner should reflect high need 
achievement, high interest in the job, and high 
need for autonomy from supervision. Nega- 
tive factor scores would measure high fear 
of failure, strong attitude toward the job as 
a stepping-stone for advancement, and high 
need for a dependency relation to the super- 
visor. 

Seventy-seven of the 267 Ss who ranked the 
incentives, or 29 per cent, volunteered addi- 
tional job incentives that they felt were not 
covered by the eight that they ranked. There 
was no sex difference in the percentage volun- 
teering incentives with 50 of the 174 male Ss 
contributing (29 per cent) and 27 of the 93 
female Ss giving additional incentives (29 per 
cent). Incomplete data suggested a possible 
verbal ability difference between the Ss who 
did and did not contribute incentives. Raw 
scores on a 30-item synonym test (five-choice 
items taken from the Cooperative Vocabulary 
Test) were available for 90 male Ss, 23 of 
whom had contributed incentives while 67 had 
not. The mean score of the contributors was 
15.5 and the mean score of the noncontribu- 
tors was 13.4. This difference in mean vo- 
cabulary scores was significant at the .05 level 
of confidence (¢ = 2.05). 

The 77 contributed incentives were trans- 
ferred to cards and an attempt was made to 
develop incentive categories from the mani- 
fest content of the statements. The con- 
tributed incentives were quite heterogeneous 
and only three identifiable categories, ac- 
counting for 58 per cent of the statements, 
contained a reasonable percentage of the con- 
tributed incentives. These three categories, 
plus a basket “miscellaneous” category, were: 

1. Opportunity to Help Others (25 per 
cent): opportunity to assist and guide chil- 
dren, to care for the physically ill and handi- 
capped, and to help people to adjust in their 
jobs and daily lives. 

2. Job. Satisfaction (19 per cent): feeling 
satisfied with the type of job and enjoyment 
of the daily activities in a particular field. 

3. Job Interest and Variety (14 per cent): 
the amount of stimulation and challenge pro- 
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Table 2 


Numbers of Students Volunteering Job Incentives 
in Four Derived Categories 


Total 





Incentive Categories Men Women 





12 
6 
4 
5 


1. Opportunity to help others 
2. Job satisfaction 

3. Job interest and variety 

4. Miscellaneous 


27 


Sl SnacK~ 


Total 








vided by the job and the relative absence of 
routine and repetitive tasks. 

4. Miscellaneous (42 per cent): job loca- 
tion and physical facilities, social status and 
community recognition of the job, independ- 
ence of authority, job mobility, etc. 

The presence of an important “social serv- 
ice” category (Category 1) among the con- 
tributed incentives is hardly surprising in 
view of the discussion at the beginning of 
this paper. Of interest is the appearance of 
a “job satisfaction” category as an independ- 
ent job incentive since most industrial psy- 
chologists have used the term “job satisfac- 
tion” as the desirable end result of satisfying 
the needs of the employee by supplying the 
important incentives in the job situation. 
Many of our Ss apparently viewed “job satis- 
faction” as a more limited factor which was 
related to the day-to-day behavioral require- 
ments of the job. Category 3 (job interest 
and variety) may be peculiar to college Ss 
and to more intelligent employees who tradi- 
tionally abhor dull and monotonous job re- 
quirements. 

Although no sex difference was found in the 
total percentages of men and women Ss con- 
tributing incentives, inspection of Table 2, 
which shows the number of Ss in each sex 
group contributing incentives in each of the 
four derived categories, indicates a _pro- 
nounced sex difference in Categories 1 and 4. 
Almost one half of the incentives contributed 
by the women fell in Category 1, while less 
than one seventh of the male incentives fell in 
this category. The men contributed many 
more diverse and heterogeneous’ incentives 
than did the women with over one half of the 
male incentives falling in Category. 4 and less 


than one fifth of the female incentives being 
grouped in this same basket category. The 
chi-square value for this contingency table was 
11.9 which, with three degrees of freedom, is 
significant at the .01 level. No sex differences 
for Categories 2 and 3 are evident with al- 
most identical percentages of male and fe- 
male incentives falling in these two categories. 


Discussion 


The results of this exploratory study indi- 
cate that the ranking methodology used offers 
a promising approach in studying the di- 
mensions of job incentives. However, fac- 
tor analyses of incentive rankings should be 
expanded by including in the sample of in- 
centives verbal statements reflecting job in- 
centives along more dimensions than were 
included in the present sample. Additional 
incentive statements can be written to describe 
Factors B and C and the content analysis 
categories found in the present study and a 
factor analytic study of the expanded list of 
incentive statements would provide evidence 
as to the adequacy of our interpretations of 
these dimensions. Such an expansion of the 
method also would clarify the descriptions of 


the factors and permit the development of 
more reliable and factorily more valid fac- 


tor scores. The random adding of incentive 
statements that are not derived from hy- 
pothesized dimensions would be a shotgun 
approach that would, in the long run, retard 
the development of an adequate taxonomy of 
incentives. 

The factor analysis extracted three orthogo- 
nal factors which were tentatively identified 
as: A. Need achievement vs. fear of failure; 
B. Interest in the job itself vs. the job as an 
opportunity for acquiring status; and C. Job 
autonomy of supervisor vs. supervisor depend- 
ency. The content analysis of the contributed 
incentives resulted in three further categories: 
1. Opportunity to help others; 2. Job satis- 
faction; and 3. Job interest and variety. Al- 
though the three factor dimensions are inde- 
pendent of each other, we cannot assume that 
the content categories are independent or that 
the categories are uncorrelated with the fac- 
tors. For example, a reasonable hypothesis 
would suggest that Factor B is the same di- 
mension as Categories 2 and 3. Another hy- 
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pothesis is that bipolar Factor A miay split 
into two factors when additional incentive 
statements are included in the sample to be 
ranked. These hypotheses should be tested 
in the next developmental stage of research 
on job incentives. 

It must be emphasized that our results sug- 
gesting certain incentive dimensions and our 
interpretations of the dimensions are highly 
tentative and exploratory. At this prelimi- 
nary stage of research it seems wiser to work 
with relatively limited successive samples of 
Ss, constantly revising procedure and testing 
hypotheses derived from preceding samples, 
than to attempt a single-shot study with a 
large sample of incentive statements and a 
huge sample of heterogeneous Ss. We antici- 
pate that our present interpretation of these 
few dimensions will have to be markedly re- 
vised and expanded as research information 
accumulates. 

Finally, the results and suggestions for fu- 
ture research can be generalized only to the 
population of undergraduate college students 
and cannot, at the present time, be applied to 
industrial situations. Dimensions that appear 
among job incentives ranked by college Ss 
may not appear in the same form or appear 
at all in similar research with industrial work- 
ers. Similarly, other dimensions may be im- 
portant in industry that fail to appear in col- 
lege samples of Ss. Only additional research 
can delimit the generalizability of results 
found for our limited sample. 


Summary 


College Ss were asked to describe their job 
goal at graduation and to rank eight selected 
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job incentive statements as to their impor- 
tance in choosing the job. A factor analy- 
sis of intercorrelations (V = 100) among the 
ranked incentives yielded three factors with 
the factors being tentatively identified as: 
need achievement vs. fear of failure, interest 
in the job vs. the job as an opportunity for 
acquiring status, and job autonomy of super- 
vision vs. supervisor dependency. A content 
analysis of incentive statements contributed 
by 29 per cent of the ranking Ss (N = 267) 
gave three major categories: opportunity to 
help others, job satisfaction, and job interest 
and variety. 
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In recent years a number of writers (3, 4, 
5, 6) have expressed concern over the adverse 
effects of simplified job content on workers’ 
attitudes. Only a few empirical studies (2, 
7, 8) have been addressed to this problem 
and all support the view that the results of 
job simplification may be a source of indus- 
trial conflict. 

Since job simplification is still accepted, 
rightly or wrongly, as a cardinal principle of 
industrial management in many types of 
work situations, it becomes of interest to ex- 
plore further the kinds of simplification or 
enlargement of job content that are associated 
with changes in workers’ attitudes and opin- 
ions. 

In the course of conducting a broader per- 
sonnel research project, a survey of workers’ 
opinions toward their supervisor and the gen- 
eral work situation was administered to sam- 
ples of hourly operators in four production 
departinents of an automotive assembly plant. 
In .two of these departments workers were 
performing on two types of jobs of clearly 
different content. In the other two depart- 
ments workers were also performing on these 
two types of jobs, but the content of one of 
the jobs recently had been modified. This 
combination of circumstances permitted ex- 
ploring the relationship between job content 
and workers’ opinions. 

In all four departments there was a sharp 
cleavage between the amount of decision mak- 
ing and power exercised by salaried members 
of management compared to the hourly work- 
ers. Katz (4) has suggested that under this 
condition the thwarting of the individual’s 
self-determination and craftsmanship is more 
likely to be a problem than when there is no 
such cleavage. 

1The writers are grateful to Orlo L. Crissey for 


permission to use the data and to C. S. Bridgman for 
his helpful comments. 
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The Opinion Questionnaire 


The instrument used to survey opinions 
was a 71-item questionnaire having the same 
format and borrowing most of its content 
from the questionnaire developed by Comrey, 
Pfiffner, and High and described in their re- 
port of “Factors Influencing Organizational 
Effectiveness” (1). The selection of scales 
for this study from the Comrey questionnaire 
was based on the sizes of the correlations the 
original investigators had obtained between 
their various scales and organizational pro- 
duction measures as well as the appropriate- 
ness of the scales for an automotive assembly 
situation. 

As used in the present study, the question- 
naire consisted of 14 relatively independent 
dimensions or groups of homogeneous items 
pertinent to different content areas. Three of 
the scales were concerned with more general 
aspects of the work situation, namely: pride 
in the work group, relations with other units, 
and confidence in the company. The remain- 
ing eleven scales related to the immediate su- 
pervisor’s consistency of behavior, decisive- 
ness, discipline, judgment, job competence, 
job helpfulness, receptiveness to suggestions, 
ability to organize his work, safety enforce- 
ment, relations with subordinates, and ability 
or willingness to communicate downward. 
Four additional items, not included in the 
Comrey questionnaire, referred to over-all job 
satisfaction, over-all satisfaction with the su- 
pervisor, communications upward, and the 
quality of training that had been received. 


The Samples 


Hourly workers in four production depart- 
ments were surveyed. These departments were 
located adjacent to one another along the as- 
sembly line and each performed a set of com- 
parable assembly operations as the line passed 
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through their areas. One department assem- 
bled the body, another painted it, a third ap- 
plied the trim and a fourth assembled the 
chassis and then assembled it to the body. 
These departments will be referred to as A, 
B, C, and D, respectively. The organization 
of the departments was the same, each being 
divided into foreman’s sections which con- 
sisted of 20 to 30 assembly operators and one 
or two utility men. All of the utility men 
and a 20 per cent rdndomly selected sample 
of assembly operators from each foreman’s 
section were surveyed. 


Differences in Job Content 


The mean survey scores of the assembly 
operators and the utility men were compared. 
The content of the two jobs differed in the 
following ways. 

Assembly operators. Each assembly opera- 
tor performed a specific task or set of tasks 
as the assembly line passed his station. The 
complete time cycle for each task was be- 
tween one and two minutes. The tasks were 
either identical for each make and model of 
car or were only negligibly different. Each 
assembly operator’s job was highly repetitive, 
routine, deskilled, mechanically paced, and 
such that the end result of his efforts con- 
tributed only an infinitesimal part of the to- 
tal process of assembling a complete car. 

Utility men. These operators were assigned 
to each foreman’s section to perform various 
utility functions. These functions were (a) 
to relieve assembly operators for scheduled or 
emergency breaks, (4) to help assembly op- 
erators who for one reason or another were 
unable to keep up with the line, (c) to dem- 
onstrate the job to new operators and gradu- 
ally yield parts of the job until the new op- 
erator could keep up with the line, (d) to 
perform temporarily the job of an assernbly 
operator who was absent until other relief 
could be found, and (e) to complete or cor- 
rect operations done incompletely or incor- 
rectly by assembly operators. The biggest 
difference between the assembly operator job 
and the utility man job was that the former 
performed a single, routine and repetitive task 
while the latter performed a wide number of 
these same routine tasks—as many as 20 or 
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30—and performed them for lengths of time 
varying between one minute and, on infre- 
quent occasions, a day. The second major 
difference is in the fact that the assembly 
operator had, for all practical purposes, no 
area of discretion as his job was defined. The 
particular task a utility man performed at 
any given time was largely dictated by the 
situation or assigned by the foreman but an 
area of limited choice remained which, small 
as it may have been, was greater than that 
of the assembly operator. 

This difference in job content existed in all 
four departments for a number of years and 
at the time of the survey still existed in two 
departments, A and B. In the other two de- 
partments, C and D, the content of the util- 
ity man’s job was changed shortly before the 
survey was administered. The circumstances 
surrounding this change were as follows. 

As part of a broader personnel program the 
decision was made to improve the quality of 
training that was given to newly hired as- 
sembly operators, to train experienced assem- 
bly operators on several jobs in addition to 
their regularly assigned job so that they might 
be rotated on jobs when the need arose, and 
to improve work methods used by assembly 
operators. The responsibility for implement- 
ing this plan was assigned to the utility men 
and their job duties were expanded to include 
the training and methods functions. To aid 
them in performing these new functions a 
training program in work methods and train- 
ing techniques was developed. The program 
consisted of 11 one-hour lecture sessions given 
over a five week period. After this change 
took place the utility men spent about one 
half of their time on their original duties and 
one half on their newly assigned duties. In 
order to have assurance that the utility men 
would have time for performing their new 
duties, a number of assembly operators were 
upgraded to the status of utility men to help 
in the more routine tasks of providing relief, 
making repairs, etc. 

At the time the opinion survey was ad 
ministered Department C had been assigned 
new utility men and had completed its train- 
ing by two weeks, Department D had been 
assigned new utility men and had completed 
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Table 1 


Comparison of Mean Survey Scores for Utility Men and Assembly Operators in Four Production Departments 








, Dept. A 
(No Training) 


Dept. B 
(No Training) 


Dept. C 


Dept. D 
(Completed Training) 


(In Training) 





N Mean SD 





Utility Men 32 233.94 35.56 


Assembly Operators 99 240.93 40.32 


half its training, and Departments A and B 
had not been assigned new utility men nor 
had they received any training. 


Results 


The first comparison was made between the 
mean total survey scores of the assembly op- 
erators and the utility men in Departments A 
and B. In view of the general belief that job 
simplification gives rise to unfavorable atti- 
tudes it would be expected that the assembly 
operators, performing on the more simplified 
jobs, should hold less favorable opinions of 
their supervisors and the work situation in 
general than the utility men. The results 
seen in the first two columns of Table 1 indi- 
cate that there was no significant difference 
between the means in either department. 
(Department A, ¢ = .89 and Department B, 
¢ = .15.) 

The scales from the questionnaire were di- 
vided into those concerned with the more gen- 
eral aspects of the work situation (pride in 
work group, relations with other units, con- 
fidence in the company and satisfaction with 
the job itself) and those concerned with the 
immediate supervisor. Comparison of the two 
groups of workers on both of these subtotals 
showed no significant differences in either de- 
partment. 

Comparison between the assembly opera- 
tors and the utility men in Departments C 
and D showed that the utility men in both 
departments held significantly more favorable 
opinions as reflected in the mean total survey 
scores. These results are found in the third 
and fourth columns of Table 1. (For De- 
partment C, ¢ = 2.43, significant at the 5 per 
cent level, and for Department D, ¢ = 2.92, 
significant at the 1 per cent level.) In De- 


N Mean 


SD N Mean SD 


246.81 36.13 41 257.32 29.62 
135 233.15 42.63 103 237.38 49.94 


Mean 


33 232.36 47.33 74 
78 233.82 42.26 


partment C the mean subscores of the scales 
relating to the work situation in general were 
significantly higher for utility men at the 10 
per cent level (¢ = 1.84) and those relating 
to the supervisor at the 1 per cent level (¢ = 
2.63). In Department D the differences were 
significant at the 5 per cent level (¢ = 2.61) 
and the 1 per cent level (¢ = 4.09) for the 
general and supervisor subscales, respectively. 

Since both pretraining and posttraining 
measures were not available for these depart- 
ments, we cannot dismiss entirely the possi- 
bility that the utility men’s scores were not 
higher before training. Indirect evidence 
suggesting this was not the case is found in 
the fact that in the two departments (A and 
B) where pretraining measures were available 
no significant differences in the total scores of 
utility men and assembly operators were ob- 
served. 

Discussion 


Contrary to general expectations no differ- 
ence was observed- in the favorableness of 
opinions of the workers on two jobs of clearly 
different content in Departments A and B. 
The question might be raised as to whether 
the survey instrument was sensitive enough 
to reflect the kinds of differences that might 
reasonably have been expected. Comparison 
of the scores for utility men and the assembly 
operators in Departments C and D suggests 
that it was. 

The fact that utility men had more favor- 
able attitudes than assembly operators in 
these two departments may be accounted for 
by any one, or combination of, the following 
kinds of influences. 

The mean scores for utility men in Depart- 
ments C and D were based on the question- 
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naire scores of two types of utility men: those 
who recently had been upgraded from the job 
of assembly operator and had not received the 
training program and those who recently had 
been assigned the responsibility for training 
and methods and had received (or were re- 
ceiving) the training program. 

In the case of the newly appointed utility 
men an expansion of job duties and change in 
status was involved. Either of these factors 
may have influenced their opinions favorably. 
From the results in Departments A and B 
these effects would not be expected to persist. 

In the case of the experienced utility men, 
the expansion of their duties to include the 
more challenging functions of methods and 
training may have brought about the more 
favorable attitudes. The change in job con- 
tent was such that entirely new skills and 
knowledge would be involved and consider- 
ably more leeway in decision making and self- 
scheduling required. Since the content had 
been changed so recently the permanence of 
any such effect would be in considerable 
doubt. 

An alternate explanation might be the pos- 
sible operation of a Hawthorne effect. The 
favorable effect might not have been the ex- 
pansion of job duties as much as the fact that 
management had singled out this group for 
special treatment. The training program in 
question was the first time that management 
had taken hourly operators off their jobs to 
provide them with classroom training. 

In any case, the survey did appear to be 
sensitive to the kind of differences that would 
have been expected if the original groups of 
utility men and assembly operators in De- 
partments A and B had held different opin- 
ions as a result of differences in their job 
content. 


Summary and Conclusions 


Assembly operators performing highly rou- 
tine and repetitive tasks held no less favor- 
able opinions toward their supervisors and to 
the work situation than did utility men per- 
forming a wide variety of these routine tasks. 
The instrument used to survey these opinions 
was seen to be sensitive enough te show dif- 
ferences between assembly operators and util- 
ity men when utility men were singled out by 
management for special treatment and had 
their job duties further expanded. If job con- 
tent is a factor in determining how favorably 
workers view their supervisor and their work 
situation, the difference in content apparently 
must be along more fundamental dimensions 
than those observed in this study 
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The development of each new mental meas- 
urement device brings with it the necessity for 
a variety of studies which supply information 
adequate to describe relevant performance 
characteristics of the new device. The Purdue 
Non-Language Adaptability Test (hereinafter 
referred to as the PNLAT) is one such new 
device and may be classified as a non-language 
group test of mental ability, intended for busi- 
ness and industrial use. Previous reports (1, 
2, 3) have supplied PNLAT norms, reliability 
and validity estimates, and information con- 
cerning various other characteristics of this 
test.2. The writer feels that the present study 
makes some additional contributions and may 
be of interest to those contemplating use of 
the PNLAT. 


Problem 


This study was conducted to provide at 
least partial answers to the following ques- 
tions: 1. What is the predictive validity of the 
PNLAT in a training situation which is fairly 
typical of many found in business and in- 
dustry? 2. How does the predictive validity 
of the PNLAT compare with that of a more 
common language intelligence measure? 3. 
Does the test exhibit a reasonable degree of 
construct validity, i.e., does the PNLAT bear 
an acceptably high relationship to another 
measure of intelligence? 4. Does the PNLAT 
correlate significantly with variables which 
should be almost independent of intelligence? 
Question four has been divided into two parts. 
First, is the PNLAT correlated with near 
distance, binocular visual acuity? The con- 
tents of PNLAT items suggest that such a re- 
lationship might exist.* Second, in a mature, 

1 This study was conducted while the writer was at 
the Purdue Calumet Extension Center, Hammond, 
Indiana. 

2No journal articles concerning the PNLAT have 
appeared prior to this and readers are therefore re- 
ferred to the three relevant theses. These theses may 
be obtained on loan from the Purdue University 
Library, Lafayette, Indiana. 


8 The PNLAT may be described as a 36-item, 15- 
minute time limit group test. Each item consists of 


376 


presenile group, does a relationship exist be- 
tween PNLAT and the age variable? 


Procedure 


The sample. Test subjects were 62 male students 
enrolled in a night course in introductory psychology 
at the technical institute level. Ages ranged from 
18 to 48 years, with a mean of 29.8, a standard devia- 
tion of 8.4. All but two of the Ss were employed 
full time in business, industry, or government. The 
two unemployed Ss were enrolled as full-time stu- 
dents. Thirty seven per cent of the group were em- 
ployed in the steel production industry, 26% in 
machinery or automotive production, 19% in the 
petroleum industry, 10% in public utilities, and 5% 
in miscellaneous businesses or government work 
Eighteen per cent held supervisory positions. Thirty 
seven per cent were salaried, 60% were hourly paid. 

The average number of years of formal education 
was 11.8 and the group was extremely homogeneous 
with respect to this variable. Seventy-one per cent 
had completed exactly 12 years of education and an 
additional 13% had completed either 11 or 13 years. 

Nature of the training. The introductory psy- 
chology course in which the Ss were enrolled was 
taught in 16 two-hour sessions and is part of various 
Purdue Technical Institute curricula. The course 
dealt with most of the usual topics to be found in 
introductory psychology courses, e.g., perception, 
emotion, and intelligence, but emphasized applications 
of psychological principles to problems of worker 
supervision and the understanding of fellow workers 

The composition of the sample and the level at 
which the course was taught suggest that this situa- 
tion has much in common with many business and 
industrial training situations. 

Obtaining the data. Each of the 62 Ss completed 
a personal information questionnaire at the beginning 
of the fourth class meeting. Immediately following 
this, the PNLAT* and the Adaptability Test, Form 


a set of 10 geometric designs or patterns, four of 
which are identical. In all but seven of the items, 
the boundaries of patterns within an item are iden- 
tical and discriminations must be based upon elements 
of visible detail within the boundaries. The S is re- 
quired to identify the four identical alternatives 
within a given group of 10 and to indicate his 
choices by marking a large cross through the alterna- 
tives chosen. 

*The test administered was the final preliminary 
form of the PNLAT, then designated as the NLPT 
(Non-Language Personnel Test), Form AB. All 
items in the preliminary and published forms of the 
test are identical. The two forms differ only in the 
contents of their cover pages. 
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A (6), were administered. The Ss received and cora- 
pleted the tests in approximately counter-balanced 
order. Thirty-four of them took the Adaptability 
Test prior to the PNLAT and 28 took the two tests 
in reverse order. Recommended testing procedures 
were followed. 

Visual acuity was tested with the near distance, 
binocular visual acuity subtest of the Bausch and 
Lomb Ortho-Rater. Individual testing sessions were 
arranged at the convenience of the Ss and extended 
throughout the 16-week semester. The Ss were 
tested with or without spectacles, depending upon the 
condition which had prevailed when the intelligence 
tests were administered. 

The criterion used in estimating the predictive 
validity of the PNLAT consisted of scores on 120 
multiple choice achievement items covering material 
taught in thé introductory psychology course. Scores 
were not corrected for chance success. The 120 iterns 
were administered, untimed, as two tests (mid- 
semester and final examination) and the odd-even 
reliability of this criterion, S-B corrected, was found 
to be .78. There was no contamination of the cri- 
terion through illicit use of predictor data 

The student group’s average scores and standard 
deviations of scores for the tests which constitute 
the main variables of this study are presented in 
Table 1 


Results and Discussion 


' Table 2 presents the Pearson product- 


moment coefficients of correlation upon which 
the major portion of the conclusions of this 


study are based. 

Concerning the first of our questions, it is 
found that the PNLAT correlates .367 with 
the achievement test criterion and that this r 
is significantly different from zero beyond the 
.01 level of confidence. Therefore, it may be 
concluded that the PNLAT has demonstrated 
statistically reliable predictive validity. 

The second question concerns the relative 
effectiveness in prediction of the PNLAT and 
the Adaptability Test. Here, Hotelling’s F 
test (4, p. 54) was used to test the null hy- 


Table 1 


Test Score Averages and Standard Deviations 
(N = 62) 


Standard 
Deviation 


Average 
Test Name 


PNLAT 

Adaptability Test 
Ortho-Rater, visual acuity 
Criterion achievement test 





Table 2 
Correlations Between Variables 
Criterion (D) (C) (B 
Age (A) 
PNLAT (B) 
Adapt. Test (C 
Vis. Acuity (D 


—.012 — .286* —.192 — .326** 
367** 156 433** 
eat 290* 
297* 


* Significant beyond .05 level 
** Significant beyond .01 level 


pothesis that the predictive validity coefficients 
of the PNLAT (.367) and the Adaptability 
Test (.597) are not different. The obtained 
F ratio is 44.26 (with 1 and 59 df), significant 
beyond the .001 level of confidence. It must 
be concluded that, in the present case, the 
Adaptability Test was a significantly better 
predictor than the PNLAT. 

The correlation of .433 between the PNLAT 
and the Adaptability Test gives evidence of 
the PNLAT’s construct validity and pertains 
to question three. This coefficient is sig- 
nificantly different from zero beyond the .01 
level. 

The first part of question four deals with 
the relationship between scores on the PNLAT 
and the near distance, binocular visual acuity 
subtest of the Ortho-Rater. The obtained r 
of .156 is not significantly different from zero. 
Also, it is remarkably similar to the r re- 
ported by Albright (1, p. 20), who obtained 
a nonsignificant correlation of .136 between 
the same two variables. Therefore, it seems 
reasonable to conclude that, notwithstanding 
the apparent visual demands of the PNLAT 
test items, test performance is essentially un- 
influenced by S’s visual acuity. As a matter 
of incidental importance, it may be seen that 
both the Adaptability Test and the criterion 
achievement test correlated positively and sig- 
nificantly with measured visual acuity. Neither 
of these findings is readily explainable nor 
were they anticipated. 

The correlation of —.326 which was found 
to exist between PNLAT scores and chrono- 
logical age relates to the second part of ques- 
tion four. This r is significantly different 
from zero beyond the .01 level of confidence. 
It suggests the likelihood of problems in ap- 
plying the PNLAT in situations where marked 
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age variations exist in the sample of Ss. Per- 
haps age norms such as Wechsler (7) and 
others have used would be helpful to PNLAT 
users. 

Although predictive validity coefficients 
greater than that in the present study have 
been frequently obtained, the r of .367 which 
was found to exist between PNLAT and the 
criterion represents a significant and, in some 
cases, useable amount of prediction. Further- 
more, it seems that there are at least two 
reasons for considering the present prediction 
situation as less than an ideal one in which to 
use a non-language test. First, the student 
sample exhibited a relatively high level of 
formal education and such students might be 
expected to display their taients on verbal 
tests without serious disadvantage. Second, 
the criterion to be predicted contains an un- 
questionably large verbal component which is 
not likely to be satisfactorily measured by 
nonverbal tests. It seems reasonable to sup- 
pose that the PNLAT will be of most value 
in situations involving subjects with limited 
knowledge of a language and involving a cri- 
terion which contains no significant verbal 
component. 

Lindner and Gurvitz report a correlation of 
.92 between Wechsler intelligence quotients 
and Revised Army Beta scores (5, p. 654). 
Such outstanding (!) coefficients are not nor- 
mally to be expected from investigations of 
construct validity, and it would seem that the 
correlation of .433 which was obtained in the 
present study satisfactorily establishes the 
presence of construct validity in the PNLAT. 
Householder reports a median correlation of 
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.446 between the PNLAT and various verbal 
mental ability tests (3, p. 50). 

The findings of this study indicate that the 
PNLAT performs about as well as might be 
expected of it. It demonstrated both predic- 
tive and construct validity and absence of 
significant contamination by variations in 
visual acuity. Its apparent discrimination 
against older test Ss does not pose problems 
which cannot be satisfactorily resolved in most 
test usage situations. It would seem that the 
PNLAT merits additional research and use 
and it is likely that, properly applied, the 
PNLAT can be a valuable addition to per- 
sonnel selection and placement operations. 


Received February 10, 1958. 
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If accurate predictions of human perform- 
ance times for psychomotor tasks can be 
made, alternative designs for machines, tools, 
and workspaces may be evaluated without 
actually building experimental models. The 
obvious and substantial economic advantages 
to be gained have led to the construction of 
a number of standard data time systems by 
workers in the field of work measurement (1, 
7). The validity of such systems has been 
under constant attack, however, because of 
doubts about the basic assumption of addi- 
tivity of the motion elements when these ele- 
ments are applied in cycles and sequences 
different from those in which the data were 
gathered originally. The principal basis for 
supporting these doubts has come from the 
results of a number of experiments by differ- 
ent investigators which rather unequivocally 
show that interaction exists among the times 
of elements, that is, element time is not only 
a function of certain major variables such as 
distance, class of fit, etc. but also a function 
of other elements in the cycle (e.g., 2, 4, 5, 
8,9, 10). In general, the results of the vari- 
ous investigations have suggested that the in- 
teraction is mainly with the elements immedi- 
ately adjacent to the element in question. 

Though from the viewpoint of element in- 
teraction or correlation, the concept of addi- 
tivity appears to be limited, it would appear 
worthwhile to ask if the mean cycle time for 
a task could be predicted from known stand- 
ard element times despite interactions that 
might make a given element time inaccurate 
in a particular situation. The rationale for 
the feasibility of such a possibility is that in 
statistics it is well known that a set of ran- 
dom variables, x; y, and z connected by a 
joint density function, f (x, y, 2), will yield an 
expected value of their sum that is equal to 
the sum of their separate expected values, re- 
gardless of interaction between the variables, 
ie, E(x+y+2) = E(x) +E (y) + E (sz) 
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(5). If the motion element times in stand- 
ard data can properly be assumed to be ran- 
dom variables it should be true that mean 
cycle times can be predicted from a knowl- 
edge of expected (mean) values of the indi- 
vidual elements. The mean total cycle time 
predicted in this manner will, of course, have 
a variance, part of which is accounted for by 
the element interaction or intercorrelation. 
Knowledge of the degree of correlation would 
permit a more precise estimate of the cycle- 
time but it remains a question as to whether 
or not the variance due to element correla- 
tions increases total variance to the point 
where estimates of mean cycle times are not 
sufficiently precise to use. 

Two studies, one by Ghiselli and Brown (3, 
p. 369) and one by Stiling (11), have direct 
bearing on the matter. The results of the 
first, which involved two patterns of key tap- 
ping sequences in which some of the keys were 
eliminated from each pattern and the pre- 
dicted and actual cycle times compared, sug- 
gested that additivity does not hold. Appar- 
ently only one S was used, however, and no 
statistical treatment of the data was avail- 
able. In Stiling’s study, 24 Ss performed a 
simple task which involved travel and ma- 
nipulation motions in aligning a pointer to a 
dial scale marking by means of a rotary knob. 
In the first task 5 dials were manipulated. 
In the second, one of the dials was left out of 
the sequence, in the third, two, and in the 
fourth task, three dials were left out. Two 
levels of alignment difficulty were used. The 
basic hypothesis was tested by predicting the 
times for the successive reduced cycles from 
measurements made in the complete cycle 
and comparing these predictions with the 
actual times taken for the reduced cycles. 
The conclusion was that the differences be- 
tween the actual and estimated cycle times 
were explainable as chance variation, that is, 
additivity held for the conditions of the ex- 
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periment. One might point out that in this 
experiment the motion pattern was not com- 
plex although the task did require some ma- 
nipulation. In addition, direction of motion 
was a confounded variable since the direction 
of the first travel motion changed with each 
successive reduction in the task. 

As the results of these two studies were in 
direct conflict, the question of element addi- 
tivity remains open. In addition both studies 





involved simple motion patterns which were . 


not representative of the complexity of typi- 
cal industrial tasks. The present study was 
designed, therefore, to attempt to resolve the 
differences and extend the applicability of the 
additivity concept to more complex motion 
patterns which might be expected to influ- 
ence the interaction of the motion elements 
composing it. 


Method 
Design of Experiment 


Figure 1 shows the work place layout which was 
used to test the hypothesis of additivity. The task 
required the S$ to assemble two parts in a fixture. 
Part A was a pin which was an inch and a half long 
and a quarter of an inch in diameter, chamfered on 
one end. Part B was a bushing, three quarters of 
an inch in diameter, one-half inch thick, and with 
a concentric hole of 0.26 in. The cycle for the right 
hand is given below: 


Workplace layout. 


. Reach to Part A 
. Grasp’ Part A 
. Move to manual advance button, depress but- 
ton 
. Move Part A to Hole 3 in fixture 
5. Position Part A in hole and release 
. Reach to Part B 
. Grasp Part B 
8. Move Part B to Hole 4 in fixture 
. Position Part B in hole and release 
. Reach to Part A in Hole 3 
11. Grasp Part A and remove from Hole 3 
12. Move Part A to Part B in Hole 4 
13. Position Part A in Part B 
14. Release Part A 
15. Reach to disposal button 
16. Depress button 


Part A was fed to the S from a dual feeder at a 
rate determined by S. The S took Part A from the 
well, pressed a manual advance button and then 
moved Part A to the fixture and positioned it in 
Hole 3. Part B, which was a bushing, was then pro- 
cured from its rack by S and moved to the fixture 
and positioned in Hole 4. Part A was then taken 
from Hole 3 and positioned in the center hole of 
Part B which was in Hole 4. The S then pressed 
the disposal button energizing a solenoid which with- 
drew a slide causing Parts A and B to drop into a 
disposal chute. The table height was set at 28 in. 
and an adjustable chair was used to vary the rela- 
tive height of the S in relation to the table. In all 
instances the chair was adjusted so that the S’s el- 
bow was about 3 in. above the table top with erect 
sitting posture. 

The basic hypothesis was tested by measuring 





Human Motor Response Elements 


times in a complete and incomplete cycle. The com- 
plete cycle consisted of all of the elements 1 through 
16. Elements 9 through 14 were measured sepa- 
rately. The times for Elements 9 through 14 were 
subtracted from the total complete cycle to yield a 
forecast of the time for an incomplete cycle. Ele- 
ments 1 through 8 plus 15 and 16 were measured 
under the conditions of the incomplete cycle. Ii 
additivity held, the mean times measured in this way 
should equal the mean times of the cycles where the 
Elements 10 through 13 were not a part of the 
manual cycle. Element 9 was measured separately 
and its time eliminated from both incomplete and 
complete cycles because of details in the instrumen- 
tation system which made it overlap in both com- 
plete and incomplete cycle conditions. The contrast 
between the times for Elements 1 through 8 plus 15 
and 16 measured under complete cycle conditions, 
with all elements included and incomplete cycle con 
ditions where Elements 10 through 13 were not part 
of the task, represents a measure of how closely ele 
ments measured in one sequence can predict the 
times required to perform the same elements in a 
different sequence. The validity of the concept of 
additivity is thus tested for the particular task situa- 
tions examined. 

Variables of discrimination and hands used (right 
hand vs. both hands) were also introduced. Dis- 
crimination was introduced by painting a yellow 
band around Part B. In half of the trials the S was 
required to take only parts with the yellow band 
and therefore a measure of the effect of discrimina 
tion was provided by a contrast of the times where 
visual discrimination was required compared to those 
where no discrimination was required. The hands 
used variable compared cycle times where the task 
was performed with the right. hand only with cycle 
times where the task was performed symmetrically 
with both hands. 

The three variables: cycle completeness, discrimi- 
nation, and hands used, were combined in a full 
factorial experiment with two levels for each factor. 

The following measurements were made: 


1. Total cycle times 

2. Time for the added elements (these times oc- 
curred in only the complete cycle), Elements 9 
through 14. 

. Time for the grasp plus transport loaded of 
Part B (bushing), Elements 7 and 8. 

. Time for the release of Part A (pin) after its 
insertion in the bushing hole (these times oc- 
curred in only the complete cycle), Element 14. 

. Time for the position plus release of Part B 
(bushing) in the fixture (these times were meas- 
ured in only the incomplete cycle), Element 9. 


Instrumentation 


To instrument the work place of Fig. 1 it was 
necessary to be able to measure total cycle time in 
both the complete and the incomplete cycles and the 
additional elements required to perform the com- 
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plete cycle as compared to the incomplete cycle 
Measurements were accumulated for 20 cycles in all 
instances so that each measurement represented an 
average. The total cycle times were accumulated on 
a precision clock. In order to measure the addi- 
tional elements required to perform the complete 
cycle a combination of three Eccles-Jordan elec- 
tronic switches and counters was required. The 
electronic switches were triggered by a 100 kc elec- 
trical pulse which was generated by a transmitter 
under the S’s chair and traveled over the surface of 
his skin. The electronic switches and counters had 
a basic accuracy of .001 sec. All instrumentation 
was put into operation by turning a multiple rotary 
switch. This was done by the experimenter and was 
synchronized with the pressing of the disposal but 
ton. All time measurements were made for the right 
hand. 


Subjects and Routine 


Sixteen male university students with preferred 
right-handedness served as Ss. Each S was sam 
pled twice for each treatment combination, using an 
accumulation of 20 cycles taken after a standard 
practice period of 50 cycles. A comparison of the 
first sample with the second provided a measure of 
practice effects 

In pretests of the task, it was noted that major 
changes in the task severely affected the S’s ability 
to perform effectively. These negative transfer ef- 
fects were particularly large for changes from the 
complete cycle to the incomplete cycle and vice 
versa. It was the tendency of Ss to introduce ex- 
traneous elements in the incomplete cycle and to 
omit them in complete cycles. This was also true, 
to a lesser extent, for changes in the hands-used 
variable. Because of this the variables were counter- 
balanced. Half of the Ss (chosen at random) were 
presented with the sequence, complete-incomplete, 
and the other half with the sequence, incomplete 
complete, on a two-day schedule. Each group was 
further subdivided so that half of them were pre- 
sented with a right-hand-both-hands sequence, and 
the other half with the reversed schedule. Within 
this structure, the presentation sequence was ran- 
domized. 


Results and Discussion 


The over-all means and standard deviations 
for the basic data are presented in Table 1. 

Preliminary analysis of the raw data for the 
five measurements by use of Bartlett’s test 
for homogeneity of variances indicated a re- 
jection of homogeneity as a hypothesis for all 
measurements except No. 4, time for the re- 
lease of Part A. To obtain homogeneity of 
variances a logarithmic transformation of the 
time scale was used in the detailed analysis of 
variance calculations, except in the case of 
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Table 1 


Means and Standard Deviations for Measured Elements 








Incomplete Cycle 





Mean 


(Seconds per 20 cycles) 


SD 





Sample Sample 
Item Measured 1 2 1 





99 


4.59 . 
4.59 


16.64 


Cycle* 4.71 
Elements 7 and 8 16.91 
Element 14 — - 
Element 9 6.66 6.59 
Elements 9-14 — _- 


Sample Sample 


Complete Cycle 


Mean SD 


Sample Sample Sample Sample 
2 1 2 1 2 





4.62 97 89 
4.55 16.23 15.83 4.84 4.14 
4.65 4.67 1.67 1.65 


28.31 28.04 


89 4.67 


4.93 5.15 





® Cycle times are in seconds per cycle. 


measurement No. 4. 
level. 

The partitioning of degrees of freedom re- 
quires the separation of subject effects (S), 
treatment effects (T), order effects (O), and 
the interactions of the main effects. O re- 
flects possible differences between Work Sam- 
ples 1 and 2, and as such represents practice 
effects during the experiments. The treatment 
effects can be further partitioned into the 
main effect and interactions of the experi- 
mental variables which are of particular in- 
terest in these experiments. Table 2 shows 
the analysis of variance for cycle times. The 
S X T interaction was highly significant, and 
thus it becomes the appropriate error term 
for testing the significance of S and of T, 
since we are interested in T independent of 
S and vice versa. 


P was set at the .01 


Table 2 
Analysis of Variance, Log Cycle Times 








Mean 


Square 


0.02232 
0.16363 
0.00335 
0.00240 
0.000497 
0.000673 
0.000543 


Sum of 
df Squares 





Subjects (S) 
Treatments (T) 
Order (O) 

Sx T 

sxo 

TxXoO 

Sx Tx0O 


0.33485 
1.14546 
0.00335 
0.25248 
0.00746 
0.00471 
0.05700 


Total 1.30531 





*p < 01. 


The F ratios for S and for T are both 
highly significant. The S is common in this 
type of experiment, but it is T independent of 
S which is of principal interest here. The 
latter are displayed in Table 3. In comput- 
ing the F ratios in Table 3, the mean square 
for S X T of 0.00240 with 105 degrees of 
freedom is the divisor. 

The most important implication of the re- 
sults shown in Table 3 is that cycle com- 
pleteness is not significant. In other words, 
the actual incomplete cycle times were fore- 
cast by measurements made in the complete 
cycle and any differences between actual and 
forecast incomplete cycle times may be ac- 
counted for by chance. This result occurs in 
the presence of highly significant effects for 
the discrimination and hands-used variables 
and for the discrimination by hands-used in- 
teraction. 

A logical question at this point concerns 
the magnitude of the added elements, for, if 
they were minute, it could be argued that 
forecast incomplete cycle time and actual in- 
complete cycle time were equal because an 
insignificant element was subtracted from the 
complete cycle. An examination of the data 
shows, however, that the mean of added ele- 
ments of 1.41 sec. per cycle was about 30% 
of the incomplete cycle time of 4.64 sec. and 
about 23% of the mean complete cycle time 
of 6.05 sec. This is hardly an insignificant 
proportion. 

Separate analysis of data for Elements 7 
and 8 (grasp plus transport loaded of Part 
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Table 3 


Log Cycle Time, Treatment Effects 


Mean 
df Square F 


.00546 
62.04* 
403.00* 
0.625 


Cycle Completeness (C) 


0.0000131 
0.1489 
0.968 
0.00150 
0.000581 0.242 
0.0211 8.79* 
0.00548 2.28 


1 
Discrimination (D) 1 
Hands Used (H) 1 
Cc xD 1 
C XH 1 
DXH 1 
CXDxXH 1 
7 


Total 


*p < 01. 


B) indicated that cycle completeness was sig- 
nificant at ~< .01. This means that for 
grasp plus transport loaded the elements that 
preceded or followed in the sequence appar- 
ently affected the time. This is evidence that 
element interaction existed in the experiment. 
This strengthens the conclusicn that addi- 
tivity may exist for mean cycle times even 
though element interaction exists. Again, the 
question of magnitude arises. Grasp plus 
transport loaded contains significant effects 
due to cycle completeness and yet forecast 
and actual total cycle times are equal. Can 
this be accounted for because the time for 
grasp plus transport loaded is so small that 
its effect is masked in a long over-all cycle? 
The mean time for grasp plus transport 
loaded was 11.8% of the total cycle mean 
and thus represented a relatively important 
element in the cycle. We have concluded, 
therefore, that within the limits of this ex- 
periment additivity has been shown to be a 
valid concept for use in predicting total cycle 
time for light manipulatory tasks involving 
several motion elements. The results imply 
that it might be worthwhile to make large 
sample population studies of industrial work- 


ers to obtain reliable standard data for mak: 


ing the predictions necessary for the effective 
design of new tasks. 


Summary 


The purpose of this study was to deter- 
mine whether or not additivity of motion ele- 
ments holds for over-all cycle time predictions 
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despite interactions among the elements. Pre- 
cise time measurements were made in a light 
manual assembly task requiring 16 motion 
elements in the complete cycle and 10 motion 
elements in the incomplete cycle for 16 male 
Ss. The results indicated that total incom- 
plete cycle times predicted from data obtained 
in the complete cycle did not differ signifi- 
cantly from times actually measured even 
though there was evidence of interactions 
among the motion elements and the variables 
of discrimination and hands-used (one-handed 
versus two-handed performance). It was con- 
cluded that additivity of motion elements 
does, indeed, seem to be a valid concept 
where several motion elements are involved. 


Received February 14, 1958. 
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Field Training Versus Technical School Training for 
Mechanics Maintaining a New Weapon System ' 


Chester J. Judy 
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In the United States Air Force and its 
predecessor organizations a varying amount 
of emphasis has been given, from time to 
time, to centrally located and organized pro- 
grams of training for maintenance personnel. 
In the Army Air Corps before World War II, 
for example, relatively few personnel received 
training at separate technical schools. Dur- 
ing World War II and for a time thereafter, 
however, the vast majority of those who be- 
came responsible for day-to-day maintenance 
of air vehicles were individuals who had re- 
ceived such training in some early portion of 
their military service. Presently, fewer air- 
men, once again, are being given complete 
training at separate technical schools. In- 
stead, somewhat greater emphasis is being 
placed upon what is generally termed “field 
training.” 

In field training, technical instruction cover- 
ing specified pieces of Air Force equipment is 
made available, through mobile training units, 
at strategic, defense, tactical, or other bases 
where “live” equipment is on hand. One ad- 
vantage of such instruction over instruction 
given at separately organized and located 
technical schools is that airmen can learn on 
the job. Training is accomplished on a part- 
time basis and partly trained airmen are 
otherwise made available, during a learning 
period, for maintenance duty at low levels of 
responsibility. In periods when trained per- 
sonnel are in extremely short supply or when 
the turnover rate among these personnel is 
high, field training on an enlarged scale is per- 
haps especially appropriate. 


1The research reported in this paper was spon- 
sored by the Personnel Laboratory, Wright Air De- 
velopment Center, Air Research and Development 
Command, under Project No. 7950. Acknowledg- 
ments are due John Schmid, who made valuable 
suggestions during the conduct of the study, and 
Joseph E. Morsh, who provided an additional criti- 
cal review of the draft copy of the manuscript. 
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Problem 


A crucial issue, in most instances when a 
choice of training must be exercised, is whether 
or not alternative plans provide equal oppor- 
tunity for learning on the part of specified 
groups of trainees. The present investigation 
was accomplished in order to answer two gen- 
eral questions concerning the training of me- 
chanics for the maintenance of one important 
new weapon system: 

1. Is there a significant difference, on the 
whole, in job knowledge of B-52 aircraft me- 
chanics who have received field training as 
compared with those who have completed a 
more formal technical school course? 

2. Is there a significant difference, at par- 
ticular levels of mechanical aptitude and 
maintenance experience, in job knowledge on 
the part of airmen who have been exposed to 
the two kinds of training environment? 

In one situation mechanics received main- 
tenance training on the B-52 aircraft in resi- 
dence at an Air Force technical school. The 
training was given over a period of two months 
on a full-time basis. In the other situation 
mechanics received maintenance training on 
the B-52 aircraft through mobile training 
units on duty at operational sites. The dura- 
tion of this training was also two months, but 
the trainees spent one half of each day on the 
job. The net gain, from an administrative 
point of view, was about 20 man-days for 
every individual who received field training 
rather than technical school training. 

Relative to both of the questions asked 
above, an hypothesis of no difference was 
adopted since the two courses presumably 
covered the same subject matter and since 
the total exposure to B-52 maintenance (in 
the classroom or on the job) was the same 
for the two groups. 
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Procedure 


The statistical design adopted for this study 
is based upon the use of the Johnson-Neyman 
Technique. In computation and plotting op- 
erations actually performed, however, for- 
mulas and procedures proposed by Walker and 
Lev (4) were used because they permit some- 
what easier computation than those developed 
by Johnson and Neyman (3). In either pro- 
cedure the basis of comparison between cate- 
gorical groups (in this study the group com- 
prised of individuals who had received field 
training versus the group comprised of indi- 
viduals who had attended technical school) is 
matched regression estimates. Here a meas- 
ure of job knowledge was used as the criterion 
variable and measures or indications of me- 
chanical aptitude and B-52 aircraft mainte- 
nance experience were used as control vari- 
ables. 

For the purposes of this study job knowl- 
edge was defined as performance on an ex- 
amination developed by Human Factors, In- 
corporated (2). In this examination, which 
is similar to a number of others which have 
been constructed for the Air Force, separate 
scores are obtainable for different areas of 
knowledge in the maintenance of specific 
pieces of Air Force equipment. This par- 
ticular examination is now being used rou- 
tinely by the Strategic Air Command to 
ascertain training needs of B-52 maintenance 
personnel. : 

The Ss of this study were 184 airplane me- 
chanics working at the “5” skill-level on the 
B-52 aircraft in November of 1956 at the first 
two Air Force bases to be fully equipped with 
that airplane.? All such mechanics available 
for daytime duty were tested. They were 
also given a short questionnaire covering train- 
ing and experience items. Mechanical Apti- 
tude Indexes derived from the Airman Classi- 
fication Battery (1) administered during basic 


2 Among aircraft maintenance personnel in the Air 
Force, a specialty code of 43131 identifies an ap- 
prentice or semi-skilled mechanic, a code of 43151 
identifies a skilled mechanic, and a code of 43171 
identifies a maintenance technician at the highest 
skill level. The Ss of this study were 43151’s and 
are referred to here as “mechanics working at the 
‘5’ skill-level” and, at other places in this report, as 
“mechanics at an intermediate level of skill.” 
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training were obtained from personnel rec- 
ords at the respective bases. 

Of the 345 mechanics originally tested, 61 
were rejected as potential Ss of this study be- 
cause they had received neither field training 
nor technical school training in the mainte- 
nance of B-52 aircraft. An additional 18 
cases were rejected by reason of incomplete 
data. Airman Classification Battery scores 
were not available for eight potential Ss and 
10 men did not complete some portion of the 
job knowledge test or some portion of the 
questionnaire. .Of the remaining 266 me- 
chanics, 174 had received field training and 
92 had received technical school training. 

In the final selection of cases an attempt 
was made to secure a greater degree of homo- 
geneity between the field-trained and school- 
trained groups by _individual-to-individual 
matching on prior aircraft maintenance ex- 
perience. It was felt that previous mainte- 
nance experience might easily contribute to 
the job knowledge of B-52 aircraft main- 
tenance personnel, and since there was no 
statistical control programmed to take ac- 
count of this variable, “months on other air- 
craft’ was used as a basis for matching. As 
a result of this procedure an additional 82 
cases were lost. In instances when more than 
one mechanic could be matched with a par- 
ticular man to form a pair, a table of random 
numbers was used in making the final choice. 
The 184 Ss selected for this investigation were 
comprised, then, of 92 B-52 airplane me- 
chanics at an intermediate level of skill who 
had received field training in B-52 mainte- 
nance matched with 92 B-52 airplane me- 
chanics at an intermediate level of skill who 
had received technical school training in B-52 
maintenance. 

Results 


The basic data for the comparison of the 


“two groups of B-52 mechanics are presented 


in Table I. The following equations, obtained 
from these data in the manner outlined by 
Walker and Lev (4, pp. 406-10), are plotted 
in Fig. 1: 
8.0671X + .6718Z — 54.0951=0 [1] 
26.00X° + 10.6528XZ — 2.7413Z? 
— 417.6X — 8.58Z + 1203 =0. [2] 


™ 
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Table 1 
Basic Data for Comparison of Two Groups of B-52 Airplane Mechanics on Job Knowledge 











Field Trained Technical School 
Datum Mechanics Trained Mechanics 





Number of cases N, = 92 Nz = 92 

Sum, scores on job knowledge test LY; = 11,054 LY: = 11,402 
Sum, months B-52 experience =Z, = 869 =Z. = 948 

Sum, mechanical aptitude index =X, = 524 =X, = 542 
Mean, job-knowledge test score Y, = 120.1522 VY. = 123.9348 
Mean, months B-52 experience Z, = 9.4456 Z2 = 10.3043 
Mean, mechanical aptitude index X, = 5.6956 X. = 5.8913 
z(y—Y)* Cyy: = 113,643.869 Cyye = 118,879.6 
2(Z—Z)? Cyn = 2588.7283 Cue = 2401.4783 
2(x—X)? Cxxi = 239.4783 Cxxz = 176.9131 
=(Y-—Y)(Z—Z) Cyar = 4949.7609 Cya = 3760.826. 
=(Y-Y)(X—X) ~ Cyxi = 2622.2609 Cyx2 = 661.3479 
=(Z—Z)(X—X) Cust = —62.5217 Cx = 33.0435 





Equation [1] is the equation for the line of test performance for the two groups is equal 
nonsignificance. It is represented by the to zero. On the right hand side of the line of 
straight line in Fig. 1, and is the locus of nonsignificance test performance is favorable 
points at which the estimate of difference in to field-trained mechanics. On the left hand 
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Field Training vs. Technical School Training 


side of that line test performance is favorable 
to technical-school-trained personnel. | Equa- 
tion [2] has been plotted in Fig. 1 to show 
curves which define the limits to a range of 
values for X and Z where performance on the 
job-knowledge test is significantly different 
(at the 5% level) for mechanics in the two 
groups. 

The point at which the variance of the ex- 
pression 8.0671X + .6718Z — 54.0951 attains 
its minimum value represents the center of 
accuracy (CA) for the plot given in Fig. 1. 
It is the point where the observed difference 
is most reliable, and since the center of ac- 
curacy in this instance lies outside both parts 
of the region of significance, the best estimate 
of the true difference between mean perform- 
ance of the groups being compared is zero. 
That the point is slightly to the left of the line 
of nonsignificance is an indication that there 
is a slight difference, though nonsignificant, in 
over-all test performance in favor of the tech- 
nical school trained group. By referring to 
the basic data in Table I it can be seen that 
mean test score for the field-trained mechanics 
was 120.15 while for the technical-school- 
trained mechanics this value was 123.93. The 
similarity of the two groups in terms of mean 
aptitude index (5.70 versus 5.89) and mean 
experience on the B-52 aircraft (9.4 months 
versus 10.3 months) will also be noted. 


Discussion 


The results of this investigation seem to 
justify, to some extent, the greater emphasis 
presently being placed on field training for 
mechanics responsible for the maintenance of 
one important new weapon system. If there 
is, in fact, no difference in the amount of job 
knowledge acquired by personnel who have 
received maintenance training in either one 
of two generally different kinds of training 
environment, then the choice between the 
training schedules can be more easily made 
on the basis of administrative convenience, 
economy, or other considerations. In the 
present instance field training is seen to be 
relatively economical in terms of manpower 
and money since the same results, on the 
whole, are obtained in less training time. 

At particular levels of mechanical aptitude 
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and maintenance experience, however, the re- 
sults of this investigation imply that field 
training should be reserved for higher aptitude 
airmen with some maintenance experience, and 
that technical school training should be more 
often scheduled for lower aptitude airmen who 
have had little or no actual work experience. 
The last part of this statement may at first 
seem to conflict with the known circumstance 
that a criterion of minimum mechanical apti- 
tude is a useful one on which to screen candi- 
dates for technical school training in the Air 
Force. In Fig. 1, also, the location of the 
left hand part of the region of significance 
might suggest that successful technical school 
training is associated with low aptitude and 
experience. All that is indicated on this 
matter by the results of this investigation, 
however, is that as long as two schedules of 
training such as the ones under immediate con- . 
sideration are being retained, and as long as 
personnel at high and low levels of aptitude 
and experience become available for assign- 
ment to either of those schedules, then per- 
haps best over-all training results may be ob- 
tained whenever training assignments are ac- 
complished on a selective basis. 

It should be emphasized that the results of 
this investigation pertain to training for the 
maintenance of only one weapon system. 
Similar studies covering other important 
weapon systems must be carried out before 
wider generalizations concerning the relative 
utility of the kind of training schedules con- 
sidered here can be made. 


Summary and Conclusions 


In this investigation a comparison on job 
knowledge is made between mechanics who 
had received field training on an important 
new weapon system and mechanics who had 
received technical school training on the same 
system. With the effects of mechanical apti- 
tude and maintenance experience controlled, 
these two conclusions seem justified: 

1. On the whole, and in the particular kind 
of situation studied, there is no significant 
difference in job knowledge on the part of 
mechanics exposed to the two training environ- 
ments. 
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2. Mechanics at higher levels of aptitude 
and experience benefit most from field train- 
ing; mechanics at lower levels of aptitude and 
experience benefit most from technical school 
training. 

In connection with the second conclusion, 
ranges of aptitude and experience wherein 
field-trained personnel and _technical-school- 
trained personnel differ significantly are speci- 
fied. Implications for present Air Force school 
assignment practices are discussed. 


Received February 25, 1958. 
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Reflex-reflective material (‘“Scotchlite”) is 
“that material which has the property of re- 
flecting incident light, from a single source, 
in a relatively narrow cone back toward the 
source” (5). When employed on road signs 
and similar displays, this material is designed 
to provide greater light reflection over large 
areas, without the “glare” induced by metal 
surfaces of similar reflecting power. 

The purpose of the present study is to 
evaluate the use of reflex-reflective material 
to improve legibility of digits viewed from sev- 
eral different angles and distances at night, 
primarily with regard to their potential use 
as aircraft markings. Cannon (2) has con- 
ducted several tests of this material which in- 
dicate that it is sufficiently durable to be used 
for aircraft and ground markings. However, 
if reflex-reflective material could be shown to 
provide legibility superior to that of other 
display materials and surfaces, the range of 
practical applications would extend beyond 
the immediate purpose of the present investi- 
gation. 


Method 


Apparatus. This study consists of two ground ex- 
periments. These experiments were conducted in an 
open field over 1000 ft. in length. 

The test displays were made up of AND (Army- 
Navy-Design) digits (see Fig. 1), 8 X 10 in., placed 
on rectangular placards, 10 X 16 in. (see Fig. 2). 
Four figure-ground display configurations were used: 


1. Black digits painted on a surface of opaque 
white paint; 

2. Black digits painted on a surface of silver re- 
flex-reflective sheeting, #2270; 

3. Black digits on a surface of uncoated aluminum; 


1 This research was supported by the United Air 
Force under Contract AF 33(616)-2844 and moni- 
tored by the Wright Air Development Center, Wright- 


Patterson Air Force Base, Ohio. The senior author 
was at that time Project Scientist at the Wright Air 
Development Center, and in charge of the project. 


4. Digits of reflex-reflective material superimposed 
on a black painted surface.” 


All numerals from 0 to 9 were presented among the 
40 placards used. 

In a given presentation, five placards having 
identical grounds were mounted side by side on a 
support which could be rotated about a vertical axis. 
The center of each placard was 48 in. above the 
ground. The display appeared to the Ss as a series 
of five digits on a plane surface, presented at sev- 
eral angles of obliqueness during the different trials 
A 5-digit display was chosen because the immediate 
memory span for 5 digits is only slightly less than 
perfect (8). Using all the digits in a single presenta- 
tion could have resulted in errors of memory up to 
80% 

A green fixation light { in. in diameter was 
mounted on a support and placed just below the 
center placard. Illumination for the display was 
obtained from a standard Air Force spotlight ® ener- 
gized by two series-connected 12-volt batteries kept 
at maximum charge through recharging before each 
experimental session. The spotlight was placed at 
the Ss’ position mounted on a rigid support and di- 
rected toward the placards. A reading lamp just be- 
hind the spotlight provided the Ss with the illumi- 
nation needed for recording their responses. 

Procedure. In Experiment I black digits were 
used against reflex-reflective material, white paint, 
or aluminum. Data were collected at viewing dis- 
tances of 144, 218, 330, and 500 ft. and at each dis- 
tance for viewing angles of 90, 60, 40, 27, and 18 
degrees. Viewing angle is the angle made by the line 
of sight and the surface viewed. Thus, when the 
line of sight is normal to the surface, it is being 
viewed at a 90° viewing angle. 

In Experiment II a comparison was made between 
digits made of reflex-reflective material placed against 
a black painted background, and black digits against 
a reflex-reflective background. Only two distances 
were used: 250 and 500 ft. Examination of results 
of Experiment I showed that distances of 218 ft. or 
less afforded so many perfect scores that little clear 
discrimination of differences in legibility of mate- 
rials was obtainable. 

The procedure used for collection of data was 


2 Specification for black and white paints used is 
Federal Mil-1-7178. 

8 General Utility Lamp, Flash, Signal, Mazda 4501, 
5. 3A 26V. 
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Fro. 1. S 


similar for the two experiments. The Ss were seated 
in two rows, one behind the other, near the spot- 
light. The midpoint of S grouping was used as the 
reference point for distance measurement. Eye level 
of seated Ss was approximately the same as target 
height above the ground. 


et-up stage. 


The E provided each S with an answer sheet and 
read these instructions: 


This is a study of the legibility of numerals used 
to identify aircraft. Over there, about one foot 
above that little green light will be a panel of five 





Fic. 2. General experimental layout. 








Legibility of 


Table 1 


Relative Luminance for Various Materials and for 
Various Viewing Angles* 


Materials Used 


Reflex- White Black 
Reflective Paint Paint 
1.44 

.23 

19 

A7 

14 

14 


Viewing 
Angle 


Alumi- 
num 


8.79 
1.00 
82 
62 
A7 
30 


90° 
85° 
60° 
40° 

’ in 
18° A7 


* When viewing is normal to the target surface, the viewing 
angle is 90°. The white surface at 85° viewing angle was con- 
sidered as the unit reference to avoid problems related to specu- 
lar reflection obtained from measurements made at a 90° view- 
ing angle. An 85° viewing angle was not used as an experi- 
mental condition. 


digits. When I say “Ready,” look steadily in that 
direction. I shall turn a spotlight on the panel 
for about 4 seconds. Your task is to read the 5 
digits. After 4 seconds I shall turn the spotlight 
off, and turn on the reading light on the post be- 
hind you. You are to write down the digits you 
saw in order in the proper boxes on your answer 
sheet. For example, I shall say, “This is number 
one. . . . Ready,” and turn on the spotlight. Sup- 
pose the five digits you saw were two .. . three 
... four... five... zero. You would write 
each of these in its proper little box after number 
one on your answer sheet. You will have ptenty 
of time to write the five digits in their correct 
order while the men are changing to a new set 
of digits. 

The sets of 5 digits will be made of different 
materials, and will be shown, also, from different 
angles. You will be shown 30 such sets from this 
distance, and then we shall move to a different 
distance. 

If you are not sure of any digit, you should 
guess . . . unless you are just unable to make any- 
thing out at all. That is, if the digit could be 
either a three, or an eight, or a five, you should 
write down which you think it is. But if it could 
be any digit from one to zero, draw a line through 
the little box for that one digit. 

Also, please do not say them out loud, or com- 
pare your results with your neighbor. This is a 
test of the legibility of the digits, and not a test 
of your special ability to read digits. 

Any questions? 

This is number one. . . . Ready. (Expose for 4 
seconds. Turn on the reading light for about 10 
sec. During that time say:) Write down the 5 
digits in the same order in which they appeared. 


The digit displays were set for the successive trials 
by the assistant Es located at the digit-support ap- 
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paratus. The spotlight was turned off during this 
period, rendering the digits indiscernible to the Ss 
A red signal light mounted beside the display appa- 
ratus was flashed to inform the E located near the 
Ss that a new series of digits had been set. E then 
exposed the digits by spotlight for 4 sec. After this 
4-sec. exposure, the small reading lamp behind the 
Ss was lighted while they recorded their responses 
Meanwhile, a new series of digits was set in place 
for the next trial. The procedure was followed 
until 30 exposures had been made at that distance. 
The Ss were then moved to the next distance and 
the procedure was repeated. 

The programing of the digits and the order of 
presentation of the test background materials were 
determined by the use of a table of random num- 
bers with the restriction that each digit must be ex- 
posed for each viewing angle at each distance for 
each material. The angular orientations were pre- 
sented in orderly sequence from 90 to 18 degrees in 
a counterbalanced order. 

Subjects. Thirteen male college students of ap- 
proximately 20 years of age were used as Ss. Each 


Table 2 
Experiment I. Number of Correct Responses, 
All Digits Combined 


13) 


(Number of digits per angle = 


(Number of Ss = 


130) 


Number* Percentage 
Dis- —_—— 
tance Angle x = = R Ww \ 


144 90 
60 
40 
27 


125 
130 
128 
130 
113 


129 
130 
128 
128 
122 


128 
125 


99.2 
100.0 
98.5 
98.5 
93.8 


96.2 
100.0 
98.5 
100.0 
86.9 


98.5 
96.2 
98.5 
76.2 
49.2 


128 
126 
126 
117 


76 


130 
130 
127 
117 

68 


98.5 
96.9 
96.9 
90.0 
58.5 


100.0 
100.0 
97.7 
90.0 
52.3 


90.0 
97.7 
72.3 
46.2 
10.8 


129 
121 
116 

79 


109 99.2 
93.1 
89.2 
60.8 


17.7 


83.8 
94.6 
82.3 
26.2 

5.4 


83.1 

51.5 

12.3 
0.0 
0.8 


73.8 
64.6 
33.8 
8.5 
1.5 


20.0 
23.1 
6.2 
0.0 
0.0 


26.2 
0.0 
0.0 
0.0 
0.0 


* The three letters in this line refer to background materials 
as follows: R, reflex-reflective; W, white; A, aluminum. 
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S had both near and far visual acuity of 20-20 or 
better (Ortho-Rater test). Eleven of these 13 Ss 
used for Experiment I were used for Experiment II. 

Calibrations. Target luminance in the field was 
measured with a Luckiesh-Taylor photometer. The 
field measurements for the white surface at 60° view- 
ing angle are listed below for each of the viewing 
distances: 


Distance 144 218 330 500 

Ft. Lamberts aA BS HT 2 
Table 1 gives these measurements as luminance ratios 
for each material at each viewing angle. In this 
table, the white surface viewed at 85° to the surface 
is shown as unit luminance. It was felt that a view- 
ing angle of 85° provided the best reference since it 


eliminated the specular reflection which occurred at 
the normal or 90° viewing angle. 


Results 


Table 2 presents the findings obtained from 
Experiment I. At distances of 330 and 500 
ft., where the height of the digits subtends a 
visual angle of 10 sec. and 7 sec. of arc re- 
spectively, identification of the digits on back- 
ground of reflex-reflective material is clearly 


Table 3 


Experiment II. Number of Correct Responses, 
All Digits Combined 


(Number of Ss = 11) 


(Number of digits per angle = 220) 








Number 
Angle R* 


Percentage 


RD> RD 


Dis- 
tance 








94.5 
100.C 
98.6 
94.5 
84.5 


90.0 
87.7 


250 90 208 
60 220 220 
40 217 
27 208 
18 186 


90 198 
60 193 
40 165 75.0 
27 $ 87 , 39.5 
18 17 0.5 7.7 


Combined Data. 


500 90 269 
60 213 60.9 
40 116 33.1 
27 26 7.4 
18 3 0.9 


Experiments I and IT 
76.9 


® R, reflex-reflective background. 
> RD, refiex-reflective digits on black background. 
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Table 4 


Tests of Significance (#) of Mean Differences in 
Legibility of Materials with All Five 
Angles of Observation Combined 





Experiment I 





Distance 


144 j 5.32°° 
218 } 8.38** 
330 AS 20.09** 
500 tar 9.48** 


10.62** 
2.64* 
Experiment II 


Distance RD-R 
250 5.68** 
500 13.65** 





* Indicates ¢ beyond the 5% level of confidence. 
** Indicates ¢ beyond the 14} level of confidence. 


superior to identification made when digits 
were placed on aluminum or white painted 
backgrounds. At 144 ft. and 218 ft., where 
the height of the digits subtends 24 sec. and 
16 sec. of arc respectively, there is no evident 
superiority of reflex-reflective material over 
the white painted background. However, a 
clear superiority of reflex-reflective back- 
ground over aluminum background is evident 
in terms of per cent correct responses at these 
shorter distances when the viewing angle is 27 
or i8 degrees. 

Table 3 summarizes the findings of Experi- 
ment II, and also shows the per cent of cor- 
rect responses represented in the combined 
data from Experiments I and II for all view- 
ing angles at the 500-ft. distance. In Ex- 
periment II no marked differences in per cent 
of correct, responses occur, at the 250-ft. dis- 
tance, for discriminations of R (black digits 
on Scotchlite background) compared with 
those of RD (Scotchlite digits on black back- 
ground) except at the 18° viewing angle. At 
this angle the legibility of the Scotchlite digits 
(RD) appears to be nearly twice that of the 
black digits (R). At the 500-ft. distance, the 
superiority of RD (Scotchlite digits on black 
background) over R (black digits on Scotch- 
lite background) is apparent for all viewing 
angles. This greater legibility of RD at the 
500-ft. distance was least pronounced for the 
normal or 90° viewing angle. 
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Table 5 


Differential Legibility of Digits at 500 Feet in Terms of Per Cent Correct Responses 
(Data for 90° and 40° viewing angles) 


Digits 


Angle Material 5 6 7 0 


90° A 5. 23.1 0.0 
W d : 38.5 7.7 

R} é : 100.0 61.5 

R? 5.s 36. 59.1 86.4 

100.0 95.5 


46.2 
15.4 
100.0 
100.0 
95.5 


92.3 
53.8 
84.6 
72.7 
100.0 


15.4 
76.9 
40.9 
95.5 


A . 4 0.0 0.0 0.0 0.0 0.0 
W . : 38.5 15.4 0.0 0.0 0.0 
R! 7.7 38.5 46.2 46.2 7.7 53.8 69.2 
R? 36. 50.0 13.6 13.6 54.5 77.3 
RD : 85.4 81.8 90.9 81.8 77.3 


' Data from Experiment I. 
*? Data from Experiment II. 


In order to determine whether or not these 
differences were significant, scores in terms of 
number of correct responses at all five angles 
of observation were obtained for each S$. The 
t test of significance for correlated measures 
was applied to the mean number of correct 
responses. The obtained ¢ ratios for the dif- 
ferences in the mean number of correct re- 
sponses for different distances and between 
materials are presented in Table 4. 

The data on the differential legibility of 
the digits for the 90° and 40° viewing an- 
gles at a distance of 500 ft. are presented in 
Table 5. At this distance, and for the 40° 
angle of sight, digits made with reflex-reflec- 
tive sheeting (RD) were found to be superior 
to those made with black paint, except for 
the digit 8, which was slightly more legible 
when presented in black against reflex-reflec- 
tive background (R), and the digit 0, which 
was about equally legible under the two con- 
ditions (R) and (RD). For the 90° angle, 
500-ft. distance, the Scotchlite digits were 
generally more legible in all cases except for 
the digit 7, and the digit 8. Digit 8 showed 
a large reversal in favor of black on Scotch- 
lite background. 


Discussion 


The results of the current experiments in- 
dicate that reflex-reflective materials con- 


tribute to the legibility of digits viewed at 
night from various distances and different 
angles. The extent of the contribution of 
Scotchlite materials to greater legibility was 
a function of both the distance of the target 
and the viewing angle of the S. Digits made 
of reflex-reflective materials placed on a black 
background were more readily discriminated, 
in general, than were digits of black paint on 
a reflex-reflective background. 

Illumination was a pertinent factor in the 
present study. The data were obtained un- 
der nighttime conditions, and the standard 
Air Force spotlight used for illuminating the 
digits provided target luminance from ap- 
proximately .05 mL to 47 mL, depending on 
the materials used and the distance of the 
spotlight from the digits. For a constant 
angular target subtense a decrease in lumi- 
nance may give rise to a decrease in reading 
performance. Hence, a reduction in the lu- 
minance of a target because of its increased 
distance from source of illumination may re- 
sult in a decrease in the readability of the 
target. Moon and Spencer (6), Shlaer (7) 
and others have shown that the steep por- 
tion of the acuity versus luminance function 
(i.e., the portion where decrease in lumi- 
nance causes appreciable decrement in per- 
formance) is in the range from .01 to 10 mL. 
The variations in luminances used in this 
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study were primarily in this range. Thus, 
the decrement in performance may reflect not 
only the decrease in angular subtense with 
increase in target distance but also the de- 
crease in target luminance with increase in 
target distance. To avoid a performance 
decrement because of low luminance level, 
the illuminating source should provide for a 
target luminance of approximately 1 ft. L. 
If the target is made of reflex-reflective ma- 
terial, the illuminating source need be only 
approximately 1/40th of that required for 
flat white paint. 

Contrast is also an important factor affect- 
ing acuity. Cobb and Moss (3) have shown 
that visual acuity increases as the contrast 
between the object and its background in- 
creases. In the present study the various 
backgrounds provided different contrasts. For 
aluminum at 90° incidence and for reflex- 
reflective materials at all angles of incidence 
the contrast ratios were approximately 95% 
or better. For aluminum at the oblique an- 
gles, the contrast ratios were in the order of 
25%, and for white paint the contrast ratios 
were in the order of 75% (except at the most 
oblique angle, at which a contrast ratio of 
The functions reported 


55% was obtained). 
by Cobb and Moss (3) show that for con- 
trast ratios greater than 75% not much 
change in acuity results from increzses in 


the ratio. Thus, except for the oblique an- 
gles for aluminum and the extreme oblique 
angle for white paint, contrast would not 
seem to be a factor affecting performance in 
this study. 

Studies on visual acuity and numeral read- 
ability (1) suggest that factors other than 
the use of reflex-reflective material are im- 
portant in obtaining the optimum legibility 
of digits. For example, doubling the size of 
the digit doubles the distance at which it can 
be read with equal clarity. However, in 
everyday usage, the optimal size of digits is 
limited by the space in which they must be 
placed. This suggests a need for re-evalua- 
tion of the width-to-height ratio of digits 
used where lateral space is unlimited but 
vertical space is limited, and vice versa. 

It is to be noted from Table 5 that legi- 
bility of the digit 8 made of Scotchlite ma- 
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terial viewed against a black background was 
inferior to the same digit made: of black 
paint viewed against a Scotchlite background. 
Berger (1) found that for white digits against 
black background a stroke-width to height 
ratio of 1 to 13 for the respective digits was 
optimal while for black digits against a white 
background a stroke-width to height ratio of 
1 to 8 was optimal. The 1 to 8 ratio was 
used for the digits in the current study. The 
reversal noted for digit 8 in Table 5 sug- 
gests the desirability of investigating the pos- 
sible improvement of legibility with Scotch- 
lite digits having a stroke-width to height 
ratio of less than 1 to 8. 


Summary 


The legibility of Scotchlite vs. other ma- 
terials was investigated. The target placards 
carried digits similar to those painted on 
present-day aircraft. The study was con- 
ducted under nighttime conditions with a 
standard Air Force spotlight used for illumi- 
nation of digits. 

In Experiments I and II male college stu- 
dents, with normal near and distance vision, 
read sets of 5 digits which were exposed for 
4 seconds per set. The digit-bearing placards 
were presented at viewing angles of 90°, 60°, 
40°, 27°, 18°. 

In Experiment I all digits were black and 
were placed against three different back- 
grounds: (a) reflex-reflective (Scotchlite) ; 
(6) white paint; (c) aluminum. The view- 
ing distances were 144, 218, 330, and 500 ft. 
In this experiment the superiority of the re- 
flex-reflective background was demonstrated 
at extreme viewing angles for all distances. 
The superiority of reflex-reflective bhack- 
ground was also shown for all viewing an- 
gles at the 330- and 500-ft. distances. 

In Experiment II the legibility of digits 
made of reflex-reflective material placed 
against a black background was compared 
with that of black digits placed against a re- 
flex-reflective background. The Scotchlite 
digits superimposed on a black background 
were found to afford superior legibility at 
extreme angles for the 250-ft. distance and 
at all angles for the 500-ft. distance. At the 
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500-ft. distance the greater legibility of the 3. Cobb, F. W., & Moss, F. K. The four variables 


individual Scotchlite digits was demonstrated ainda J. Prenkiin Inst., 
at the 40° angle for all except the digits 8 . Crook, M. N, ‘& Baxter, F. S. The design of 
and 0. digits. WADC Technical Report 54-262, June 
1954. 
Received February 27, 1958. 5. Minnesota Mining & Manufacturing Co. Reflec- 
tive characteristic of “Scotchlite.” St. Paul: 
Author, undated manual, circa 1943. 
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Studies in Management Training Evaluation: I. Scaling 
Responses to Human Relations Training Cases ' 


C. H. Lawshe, Robert A. Bolda, and R. L. Brune 


Occupational Research Center, Purdue University 


A series of studies has been undertaken in 
the Occupational Research Center to evaluate 
certain techniques popularly employed in hu- 
man relations training. In order to determine 
pre- and posttraining performance levels of 
subject groups, the authors decided upon using 
a standard stimulus device analogous to a 
work sample in human relations. A work 
sample was required which would elicit re- 
sponses related to effective handling of social 
interaction situations. 

The stimuli were three commercially avail- 
able sound-slide film cases selected from the 
McGraw-Hill Supervisory Problems in the 
Plant Series. These were: (a) Case of the 
Reddened Eyes, (6) Case of the Reluctant 
Electrician, and (c) Case of Ben’s Problem 
Workers. Each of these cases presents the 
development of a human problem situation 
involving a foreman and one or more em- 
ployees and ends at a point where supervisory 
action is required to relieve the situation. 

Two dimensions of primary interest were 
conceptualized. The first was called Em- 
ployee-Orientation—the extent to which an 
S’s proposed course of supervisory action re- 
flects a cognizance of the human problem in 
the case. The second dimension, Sensitivity, 
was defined as the ability to use the informa- 
tion in the film to explain the employee’s be- 
havior. It is the purpose of this article to 
describe a scaling procedure by means of 
which open-end responses to stimulus films can 
be reliably scored. 


Procedure 


Several groups of academic and industrial Ss were 
shown the three films and wrote their responses to 
the following questions: 


1. If you were the foreman in this case, what 
would you do now? 

2. Why did the employee behave the way he (she) 
did? 


1This research is supported by a grant from the 
Foundation for Research on Human Behavior. 
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Responses to the first question were scaled on an 
Employee-Orientation continuum; responses to the 
second were rated on Sensitivity. 

Sixteen judges who were familiar with the cases 
assigned the responses among nine categories of a 
forced distribution. Because of the time required 
for rating, each response was rated by eight judges 
The ratings instructions were: 

Orientation Scaling: “. . . A high employee-orienta- 
tion response is one which reflects cognizance of the 
human problem described in the film. This cog 
nizance, as reflected in the course of action selected, 
is what we want you to rate. Low employee-orienta- 
tion may be evidenced in task-oriented responses, or 
in answers which tend to avoid the human problem 
presented.” 

Sensitivity Scaling: “. . . scale these responses along 
a continuum of sensitivity to the employee's feel 
ings. A ‘high’ response would be one which reflects 
the subjects ability to use subtle social cues presented 
in the film to explain the employee’s behavior. A 
‘poor’ response would reflect complete insensitivity, 
or unwarranted value judgments.” 


Analysis 


A summary of the analysis of the rating 
task is presented in Table 1. The authors 
conclude that only the responses to the Case 
of the Reddened Eyes were sufficiently dis- 
criminable to justify its use as a research in- 
strument. The correlation between Orienta- 
tion and Sensitivity scores on this case was 
.56 which indicates that the dimensions are 
not independent. 


Extension to Scaling Subsequent Responses 


The research outlined above described the 
scaling of responses obtained in several initial 
subject groups. In order to make use of this 
information in scaling responses obtained in 
subsequent groups, a second scaling study 
was Carried out on responses to the Case of the 
Reddened Eyes. Thirty-seven new responses 
to each question on this case were obtained, 
and it was desired to attach scale values to 
them. 

Master scale. As an initial step in scaling 
these new items, two master scales were con- 
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Table 1 


Means, Variances and Average Rater Intercorrelations of Response Scale Scores 


Reddened 
Eyes 

Sens. 

Scale 


Orient. 
Scale 


Mean 4.99 5.01 
Variance 6.65 5.76 
r .67 63 


structed from scaled responses to the questions 
for the Case of the Reddened Eyes. Ten and 
13 “bench-mark”’ responses were selected from 
each of the two sets of previously scaled re- 
sponses. The criterion of acceptability for a 
“bench-mark” response was arbitrarily set in 
terms of the range of category scores as- 
signed to the response. Any response on 
which the range of category judgments was 
two or less was selected as a “bench-mark”; 
that is, only those responses were considered 
on which the eight judges exhibited a high 
degree of agreement. Displays were con- 
structed, showing a vertical scale graduated 
in category .scores, with the “bench-mark” 
responses keyed into the vertical scale at ap- 
propriate points. 

Scaling method. Four judges rated the two 
new sets of responses to the Case of the Red- 
dened Eyes by assigning scale scores accord- 
ing to their judgments of the new response’s 
position on the master scale (i.e., relative to 
the bench-mark items). The judging instr uc- 
tions were identical to those described previ- 
ously. Both “What” and “Why” responses 
were scaled in this manner. Judge agreements 
on these tasks are shown in Table 2. 

Adequacy of ratings. In order to check on 
the correspondence of scale values obtained in 
this abbreviated method with those obtained 
by the forced-sort procedure, 12 responses 
to each question were randomly picked from 
the previously scaled responses to! the Case of 
the Reddened Eyes. The four judges slotted 
these responses into the master scales, and 
scale scores were obtained by averaging the 
four new judgments on each response. Rater 
agreements on these tasks are shown in Table 


Reluctant 
Electrician 


Ben’s Problem 
Workers 


Orient: 
Scale 


Sens. 
Scale 


Sens. 
Scale 


Orient. 
Scale 


4.99 4.99 5.00 4.99 
6.40 2.52 4.49 3.38 
75 48 46 35 


2 along with the correlations between the new 
scale scores and those assigned in the forced- 
sort procedure. 

These data indicate that an adequate scal- 
ing job can be done by a smaller group of 
judges using the master scaling scheme. Both 
interjudge agreements and correlations with 
the initial scale values substantiate this con- 
tention. It was noted that in the abbreviated 
rating approach, the judges tended to build 
a constant error into the judgments; on the 
average, the response scores obtained by the 
“keying-in” method tended to be too high by 
approximately 4 scale category. This distor- 
tion can be eliminated by a transformation of 
the scale and is of no consequence unless com- 
parisons are made between new responses and 
those scaled by the forced-sort procedure. 
Such a comparison is not anticipated. 


Table 2 


Correlational Results of Abbreviated Scaling Procedure 
(Master Scaling Scheme) 


Case of the Reddened Eyes 
Orien 


tation 
Scale 


Sensi 
tivity 
Scale 


Correlations between scale values 
obtained by forced-sort and ab 
breviated scale methoc's 

Average judge intercorrelations 
on abbreviated scaling of “old” 
items 

Average judge intercorrelations 
on abbreviated scaling of 37 

“new” items 
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Summary and Conclusions 


In an attempt to develop a training evalua- 
tion device having applicability to several 
levels and types of management groups, stand- 
ard human relations training cases were se- 
lected to serve as work samples of human 
problem situations. These cases describe the 
development of a problem situation and re- 
quire the supervisory S to propose a course of 
action appropriate for the solution of the 
problem. Following presentation of a film, Ss 
were asked to indicate (a) what they would do 
if they were the foreman, and (b) why the 
employee behaved the way he did. Responses 
to the first question were scaled on an Em- 
ployee-Orientation continuum, reflecting the 
extent to which the course of action proposed 
indicated an awareness of the human problem. 
Responses to the second question were scaled 
on Sensitivity, the extent to which the sub- 
ject’s responses reflected the ability to use 
subtle, social cues presented in the film to 
explain the employee’s behavior. 

A total of 16 judges participated in scaling 


these six sets of responses, using a forced-dis- 
tribution scheme. The average rater inter- 
correlations for the six scaling tasks prompted 
the decision to eliminate the cases of the Re- 
luctant Electrician and Ben’s Problem Work- 
ers from further research. 

A second scaling study was described which 
was directed at scaling new responses to a 
case without duplicating the laborious forced- 
sort procedure. Master scales were con- 
structed of “bench-mark”’ responses on which 
judge agreement was high, and new responses 
were assigned scores according to their quality 
with reference to the “bench-mark’’ items. 
Judge agreements on this task were shown to 
be adequately high, and comparisons between 
scale values obtained by the abbreviated pro- 
cedure and those obtained in the forced-sort 
approach indicate that the master scale method 
can be utilized with confidence. 

Further information with respect to “re- 
test” reliabilities and scale validity will be 
presented in subsequent articles. 


Received March 3, 1958. 
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Matching Indices for Use in Forced-Choice 
Scale Construction 


| Robert F. Morrison 


Towa State College 


' and Howard Maher 


University of Pennsylvania 


Investigation of the forced-choice method 
has shown a need for basic research and 
an integration of previous studies. In this 
method, statements are paired for equal ap- 
pearance but unequal discrimination. The 
problem here investigated concerns various 
methods of equating statements for appear- 
ance. If items have several “appearances,” 
the problem of matching may be so great as 
to make forced-choice test construction ex- 
tremely difficult. 

Some work has already been done on this 
problem. Gordon (4) studied four methods 
of equating statements for appearance, utiliz- 
ing both favorable and unfavorable appearing 
statements, while Edwards and Horst (1) 
and Highland and Berkshire (5) each used 
two methods. Wherry ' utilized both posi- 
tive and negative items in a study of many 
appearance variables derived from a search 
of rating literature. 

This study is an attempt to integrate the 
results of previous studies and to add to 
them. - This is done by analyzing a compre- 
hensive list of appearance scales derived from 
both a search of the literature and from the 
insights of people making decisions in a 
forced-choice test situation. Only positively 
toned items are used. 

The basic question remains: Is more than 
one appearance index necessary in order to 
equate forced-choice items? 


Procedure 


The 100 items used in this study were 
taken by use of a table of random numbers 
from a total of 336 items developed in an 


1 Wherry, R. J. Information on his investigation 
of forced-choice appearance indices in a personal 
communication to H. Maher, 1952. 
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Office of Naval Research project... The ONR 
project started with an attempt to gain em- 
pirical evidence as to the importance of vari- 
ous personality dimensions. More than 500 
students were asked to name and give exam- 
ples of the five most admired and five most 
disliked personality characteristics. A panel 
of three judges categorized these statements 
into 12 areas representing general personality 
characteristics. Statements typical of each 
area were then used to form 12 50-word de- 
scriptions, one for each characteristic. A 
group of 300 students ranked the 12 descrip- 
tions in order of importance of the charac- 
teristics, as seen in their associates. A cate- 
gory named “Friendliness and Cooperation”’ 
was ranked as the top one. Consequently 
this was chosen as the category for study. 
The 336 cooperation items were accumulated 
for this category from four sources—the 
original group of statements, a new group of 
200 students who listed examples of “Friend- 
liness and Cooperation,” existing personality 
tests, and the test constructors themselves. 
In the initial step of the present investiga- 
tion a short “test” was constructed to be em- 
ployed in establishing “beating” hypotheses. 
The 100 items were reduced to 40 by further 
use of a table of random numbers. These 
were placed in 10 forced-choice blocks of four 
items each, the matching of items being per- 
formed by a three-man panel utilizing a find- 
ing by Ghiselli (3) in which a forced-choice 
scale was constructed by inspection alone. 
The “test” was administered to 20 fraternity 
men, each being asked to rank the four items 
in each block in the order which would give 
2 This project was N7 ONR 37103; however, the 
opinions and assertions contained herein are those 
of the writers and are not to be construed as offi- 


cial or representing the views of the Navy Depart- 
ment or the Naval service at large. 
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him the highest score in applying for fra- 
ternity admission. 

Once the testee had completed the test, he 
was asked to state what rationale was used 
in determining his ranking within each block. 
Each choice the interviewee made was fully 
discussed, the entire interview being tape re- 
corded. The interviews were transcribed and 
copies were distributed to a six-man panel of 
staff and graduate students working inde- 
pendently to form “beating” hypotheses. 

From the panel’s discussion, 10 general 
categories were formed which could be rated 
on a scale continuum, had been most fre- 
quently mentioned in the interviews, were be- 
lieved important in rating cooperation and 
friendliness statements, and were thought ap- 
plicable to forced-choice scale construction. 
Some were also backed by information de- 
rived from previous studies by Wherry (see 
Footnote 1) and Gordon (4). Finally five- 
point scales were constructed to represent the 
categories. The 10 scales, more completely 


described by Maher (6), were Group Cen- 
teredness, Basic Value, Restriction, Desir- 
ability, Clarity, Breadth of Coverage, Lead- 
ership, Practicality, Sincerity, and Activity. : 


An example of the scaling used is: 
Breadth of Coverage 


5 This item is a very general and all-inclu- 
sive one. It covers several, more specific 
. characteristics of people. 
This item is a somewhat general item. 
This item is really neither general nor 
specific. 
This item is a somewhat narrow and spe- 
cific one. 
This item is extremely specific, describ- 
ing only a single, narrow characteristic 
of a person. 


An eleventh scale, Certainty of Observation, 
was based on Wherry’s (see Footnote 1) 
Factor II, named Certainty of Observation. 

Next, the 100 items were arranged into 
booklets, being presented, at this stage, not 
in forced-choice form but as items to be 
judged singly on the five-point scales previ- 
ously mentioned. As a pilot study to deter- 
mine the homogeneity of fraternity and non- 
fraternity samples, the booklets were given 
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to a group of 11 fraternity and 24 nonfrater- 
nity men with directions for rating each item 
on a trial scale (Breadth of Coverage) 
picked at random from the other 10. Since 
the product-moment coefficient between the 
item means was .76, the two groups were as- 
sumed to be comparable. This finding served 
as the basis for the method of data collection 
used in the next step in which both fraternity 
and nonfraternity subjects were combined. 
Copies of the 11 sets of previously described 
directions and the 100-item booklets were 
randomly distributed, so that each item was 
rated 36 times on each scale. Three hundred 
ninety-six students served as item raters. All 
papers having omissions or dual answers were 
eliminated and the number of raters reduced 
to 33 per scale by either the above elimina- 
tion or by randomization. Frequencies were 
then obtained for each item on each scale, 
and a mean scale value for each item was 
calculated. Using these means, the intercor- 
relations for the first 11 scales were calcu- 
lated and placed in the correlation matrix. 
Two additional scales were “borrowed” 
from the above-mentioned ONR project (6). 
In that study 1029 fraternity men had been 
ranked by their fraternity brothers. The in- 
trafraternity reliabilities of this ranking cri- 
terion ranged from .83 to .97 with only four 
below .90. The mean odd-even reliability 
corrected by the Spearman-Brown formula 
was .93 for all fraternities. These men were 
then split by fraternities into three groups 
equated for reliability coefficients and sam- 
ple number. One of these groups was used 
to determine item Preference and Discrimi- 
nation Indices. The others were retained for 
the ONR forced-choice validation and cross- 
validation steps. For the Preference scale, 
150 raters * judged each of the 100 items on 
how well it described him. The Preference 
Index for an item was the mean self-descrip- 
tion for all 150 raters. The Discrimination 
Index for an item was the mean Preference 
’ Three hundred judges, reduced from 343 by at- 
trition due to graduation and school drop-out since 
the ranking procedure was done, had been retained 
for item rating purposes. To save time, however, 
the 336 items had been split into two equal parts of 
168 items each. These parts were given out alter- 


nately in each fraternity to maintain an equal N in 
each house. 
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score on the item for the top one third of the 
group (as derived from the intrafraternity 
rankings) minus the mean Preference score 
on the item for the bottom one third of the 
group. Intercorrelations were found between 
these two scales and each of the previously 
described 11 scales to fill out the correlation 
matrix to its full 78 intercorrelations. Fi- 
nally, the matrix was faétor analyzed to yield 
five factors. 


Results 


Because of its large number of negative in- 
tercorrelations with the other scales, the Re- 
striction scale was reversed to make for easier 
interpretation. It then became a Non-Re- 
stricted or Universal Behavior scale, and the 
signs of its intercorrelations were changed to 
make nine positive values and only three 
negative values. 

The matrix of scale intercorrelations 
(Table 1) was analyzed using Fruchter’s 
(2) procedure for the centroid method. Two 
criteria, Humphrey’s Rule (2, p. 79) and 
Tucker’s Phi tests (2, p. 77), were used for 
stopping after the extraction of five factors. 

The factors, rotated four times, are shown 
in Table 2. Rotation was carried to a com- 
promise between psychological meaning and 
simple structure. However, loadings were so 
heavy on unrotated Factor I that it was be- 
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lieved best to maintain it as a general one. 
This may also be supported by examination 
of the original matrix, the intercorrelations 
apparently being high enough to support the 
hypothesis of a general factor. Furthermore, 
in the light of Gordon’s (4) findings, the 
concept of a general factor would seem par- 
simonious. 

The rotated factors found in the study are: 

I. Social Desirability—failing to have ma- 
jor loadings‘ only on Group Centeredness 
and Breadth of Coverage, and containing 
59% of the total variance. 

II. Universal Behavior—composed of a 
high positive loading (.60) on Universal Be- 
havior and appreciable negative loadings on 
Leadership (— .50) and Preference (— .40). 

III. Undesirable Activity—having a posi- 
tive loading on Activity (.32) and negative 
loadings on Leadership (— .30) and Desir- 
ability (— .38). 

IV. Breadth of Coverage—containing load- 
ings only on Breadth of Coverage (.66) and 
Certainty of Observation (.47). This is the 
only high loading obtained for the former 
scale. 

V. Nonvalid Preference—containing only 
one appreciable loading, e.g., the Preference 
loading (.48), but the tendency for a negative 


4A level of .30 was chosen as the criterion for in- 
clusion in a factor 


Table 1 
Original Intercorrelation Matrix (Upper Half of Table) and Matrix of Residuals After 
Extraction of Five Factors (Lower Half) 


Group Centeredness 
Basic Value 00 
. Universal Behavior —12 
. Desirability 03 
©. Clarity 09 
. Breadth of Coverage —O08 00 
G. Leadership —03 —O1 
H. Practicality 01 03 02 
Sincerity —04 o —-0Ol 
Activity 00 04 00 
.. Certainty of Observation —04 03 —03 
Preference OF 05 02 
. Discrimination 06 02 01 


G 


04 


56 


49 5 64 
33 
006 06 

-04 —11 + 32 
00 05 05 7 : 66 
02 03 03 — 5 ‘ 77 
03 —04 03 —06 71 
—O8 02 —01 02 36 

—O8 07 07 —03 03 


—06 06 —02 04 —06 —01 —08 


Note.—Two-place decimals have been omitted from the table, i.c., results rounded to two significant figures, decimal point 
omitted. 
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Table 2 











Ill IV 


Factor* Loadings® After Rotation 








Group Centeredness 
Basic Value 
Universal Behavior 
Desirability 

Clarity 

Breadth of Coverage 
Leadership 
Practicality 
Sincerity 

Activity 

Certainty of Observation 
Preference 
Discrimination 





12 15 
—28 33 
27 33 
—38 34 
—28 
17 66 20 
—30 —25 
—09 04 —05 
18 08 —03 
32 03 02 
18 47 21 
17 —11 48 
—21 25 03 —27 


* The factors are: Social Desirability (I), Universal Behavior (II), Undesirable Activity (III), Breadth of Coverage (IV), 


and Nonvalid Preference (V) 
> Two-place decimals have been omitted from the table. 


loading on Discrimination (or validity) should 
be noted. 


Discussion 


What interpretation can be obtained from 
the foregoing results? Factor I (here called 
Social Desirability) supports the findings of 
Gordon (4) and Edwards and Horst (1), i.e. 
regardless of what we call our scales the per- 
son reacts to the items in terms of general 
social desirability or appearance of the items. 
The loading on Preference (.80) indicates 
that the matching scale used since the in- 
ception of forced-choice scales is generally 
satisfactory. The finding is a fortunate one 
in terms of economy of scale construction, 
i.e., investigators apparently can match on 
Preference and have some assurance that 
they are thus controlling on general appear- 
ance of the items. Note also that an ele- 
ment of the Discrimination Index rides along 
with the factor. As computed here, and in 
many other instances, the Discrimination 
Index and the Preference Index are not in- 
dependent of each other. However, the load- 
ing is small enough in Factor I so that it is 
still possible to pair for appearance and have 
discrimination different (as it would not be 
possible if r were 1.00, and difficult if r were 
appreciably high). Thus general appearance 
is not a giveaway to discrimination. With 


higher loadings here a testee or ratee might 
be able to detect the valid item by spotting 
the “better looking” item (where matching 
on appearance was not perfect). 

Factors II, III, IV, and V are much weaker 
than Factor I since their combined variance 
is still much less than the variance for that 
factor alone. However, they also are of pos- 
sible aid in the construction of forced-choice 
items. Factor II may indicate that some 
items can be so common they are to be con- 
sidered undesirable. Factor III has very 
limited loadings so its interpretation is not 
clear, but Factor IV may indicate that items 
describing general behavior would give the 
testee or ratee more opportunity to observe 
the characteristic in himself. Factor V would 
seem to be expressing items with popularity 
but with negative validity, “suppressor” items, 
items with favorable appearance which, if en- 
dorsed, serve to lower the S’s score. 

This study somewhat parallels the work of 
Wherry (see Footnote 1). His largest factor, 
Positive Emotional Tone, resembles Factor I, 
Social Desirability. Failure of this study to 
completely verify Wherry’s factors may have 
arisen because of procedural differences such 
as different item sources, use.of positive items 
only, and rotation differences. Following 


matrix appearance, our first factor, Social 
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Desirability, was kept high, and rotation was 
merely used in “cleaning up” the factors. 

Above all, however, the major finding of 
this study, e.g., the general factor, would in- 
dicate that forced-choice scale construction 
can remain a relatively simple procedure. 
Thus items may be matched on a social de- 
sirability index with some assurance that 
many other “appearances” will not also re- 
quire equating. 


Summary 


A study was made of 12 “appearance” ' 


scales for possible use in forced-choice scale 
construction. Both interviews and a review 
of the literature were sources of the scales. 
The interviews were conducted with frater- 
nity men who were asked what criteria they 
had used in attempting to “beat” a forced- 
choice test set up supposedly to screen fra- 
ternity applicants. The review of literature 


produced two scales, Preference and Cer- 
tainty of Observation, while the interview in- 
troduced seven more scales, Group Centered- 
ness, Activity, Leadership, Breadth of Cover- 
age, Restriction, Practicality, and Sincerity. 
Together the literature and interviews pro- 


duced the remaining three scales, Basic Value, 
Clarity, and Desirability. 

The Discrimination Index was then added 
to the 12 appearance scales. From mean 
item ratings, product-moment intercorrela- 
tions were calculated, and the correlation 
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matrix was factor analyzed. Rotation was 
minimal and served only to clear up the in- 
terpretation of the five factors obtained. The 
finding of a general factor, supported by 
previous studies, brings an element of econ- 
omy to forced-choice scale construction tend- 
ing to support the pairing of items on only 
one appearance index. Because of its high 
loading, the most commonly used _ index, 
Preference, seems to be justified in its usage. 


Received March 4, 1958. 
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This study examined some differential ef- 
fects of group decision, group discussion, and 
their interaction. Three hypotheses were 
tested: 


1. Group discussion promotes coalescence 
(or increased agreement), effectiveness, and 
change. 

2. Group decision, per se, does likewise. 

3. A combination of discussion and decision 
yields the greatest amount of coalescence, ef- 
fectiveness and change. The absence of both 
discussion and decision produces the least 
coalescence, effectiveness and change. 


The dependent variables were measured by 
the correlations between and within a series 
of true rank orders and rank order judgments 
by members before and after experimental 
treatment. These correlations and their 
change yielded the measures—coalescence, 
stability, and effectiveness. 

There were four treatments. Five groups 
only discussed the rankings, five groups 
reached decisions without discussion, five 
groups did both while five others did neither. 

Aside from its practical significance, the 
problem has some theoretical import. A 
theory of leadership proposed elsewhere (2) 
assumes that most change in a group faced 
with a problem occurs due to interaction; rela- 
tively little need be attributed to isolated 
problem solving unless special feedback condi- 
tions occur within the problem itself or the 
collection of individuals is irrelevant to need 
reduction by the individuals. Participation 
or interaction among members should produce 
change and the consequences of change— 
coalescence and effectiveness. Members with- 


1 This study was assisted by Austin W. Flint work- 
ing under Contract N7ONR 35609, Group Psychol- 
ogy Branch. The study also received support from 
the Louisiana State University Council on Research. 
The investigators wish to thank Donald J. Lewis for 
editorial assistance. 

2 Now at the University of Alabama. 
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out opportunity to interact should change very 
little in behavior. 

Secondly, members of a group will be ex- 
pected to change in a direction they perceive 
is more rewarding. If members are attracted 
to the group, at least minimally, they will 
tend to accept group opinion as more likely 
to bring them reward than if they adhere to 
their own. But this cannot be done if the 
group’s opinion never is stated. Participation, 
per se, can give some clue to the group’s opin- 
ion, but a stated decision will provide the 
clarity promoting member change. 


Earlier Studies 


The effects on group behavior of decision- 
making and discussion have been studied 
earlier, but many previous investigations have 
been field studies with less experimental con- 
trol possible than in the laboratory. Many of 
the laboratory studies did not attempt to 
separate the effects of discussion from group 
decision-making. For example, as early as 
1914, Munsterberg (14) reported that indi- 
vidual’s judgments of the number of dots on 
cards were more correct after participation in 
a group discussion. While Burtt (6) also 
found that discussion promoted change, he 
noted that average effectiveness was not in- 
creased. Radke and Klisurich (15) observed 
that mothers of new-born infants who engaged 
in discussion among themselves under the 
leadership of a dietician, eventually reaching 
group decisions coinciding with the usual 
recommended procedures, adopted the desired 
behavioral patterns much more effectively 
than a control group receiving individual in- 
struction. 

Anderson (1) described the presumed fore- 
stalling by discussion of a strike of union 
members in a Detroit factory. A meeting was 
held of management and union committeemen 
during which the offended members were 
allowed to discuss their grievances openly. 





Effects of Decision and Discussion 


Faced with resistance by pajama factory 
workers to changing methods, Coch and 
French (7) compared a control group of op- 
erators, taught the usual way, with two ex- 
perimental groups of workers. The new 
method was accepted more readily in both 
experimental situations where the workers 
themselves or worker representatives were per- 
mitted to discuss and decide on the changes in 
method. Similarly, Levine and Butler (8) 
reduced the “halo” in merit ratings assigned 
by foremen significantly more than in control 
groups by permitting them to discuss and 
make decisions regarding more realistic evalua- 
tions. 

In a study designed to reveal the effects 
of discussion, decision, commitment and con- 
sensus on the willingness of students to par- 
ticipate in experiments, Bennett (5) found: 

1. Group discussion was no more effective 
than lecture or no influence at all in producing 
the desired action. 

2. More Ss volunteered from among groups 
required to make a decision than from those 
who were not. 

3. Public commitment was no more effective 
in producing the response than was private 
commitment. 

4. A high degree of group consensus or 
agreement regarding the decision to volunteer 
was more effective than was a low degree. 

Yet, Bennett suggested that many uncon- 
trolled factors, such as salience of subject 
matter and variations in group cohesiveness, 
may have been operative in this field situa- 
tion, making generalization difficult. 

In the same vein, McKeachie (12) con- 
trasted 3 conditions: group discussions fol- 
lowed by decisions; lectures followed by the 
announcement of the results of a secret ballot 
and lecture with votes not announced. But, 
the effects of discussion in the absence of 
decision were not examined. Results indi- 
cated that members shifted opinion in the 
direction they perceived the group as a whole 
was changing. 


Method 


Subjects. Twenty groups of five Ss were recruited 
from classes of elementary psychology students at 
Louisiana State University. Incentives were the ad- 
dition of one point to the course grade of each 
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volunteer, and monetary awards of $30 for the 
group with the highest final accuracy scores and $6 
for each of the next five most successful groups 
Volunteers were assigned groups randomiy within 
scheduling limitations. Groups were assigned treat- 
ments randomly. 

Materials and apparatus. Ten sets of names of 
five cities comprised the material to be ranked before 
and after treatment. The “mark-sense-to-electronic 
calculator” method described elsewhere (4) was used 
to register the rankings. 

Procedure. For a given problem, Ss under all four 
treatments registered their initial private opinions 
about the rank order of size of population of five 
cities. They registered their opinions again follow 
ing an intervening period of approximately 3 min 
but each treatment involved different conditions dur 
ing the intervening period. There were 10 such 
problems. Each successive problem involved the 
names of five new cities. Five groups of five Ss each 
underwent one of the four intervening conditions: 
discussion-decision ; discussion—no decision; decision 
no discussion; and no discussion—no decision. 

Discussion-decision groups discussed the rank order 
of cities for approximately 3 min. to reach a group 
decision announced by one of the members. Dis- 
cussion-no decision groups discussed the rankings 
without instruction to reach or announce a group 
decision. Decision—no discussion Ss were assigned 
the irrelevant Social Acquiescence Scale (3) for 2 
min., after which a group decision was obtained by 
secret ballot and announced. Ss under “no discus- 
sion—no decision” were assigned the irrelevant Social 
Acquiescence Scale for the full 3 min. between rank- 
ings. The correct rankings never were presented to 
any of the Ss. 


Analysis of Results 


The mark-sense cards on which Ss recorded 
their rankings for all 10 problems, following 
machine conversion and calculation, yielded 
the data on coalescence, effectiveness and 
change for statistical treatment using the group 
as the unit of analysis." The analyses of 
variance calculated for each of the three de- 
pendent variables of coalescence, effectiveness, 
and change ignored the error variance within 
groups due to differences between trials and 
between Ss within the same group and ex- 
amined scores averaged for all 10 problems. 
The estimates of error were inflated by these 
within-group errors. 

Coalescence. Within a given group, for a 
given problem, coalescence was the difference 

8 Calculations were completed by courtesy of the 
Data Processing Laboratory at the Esso Baton Rouge 


refinery on an IBM 650 using an instruction deck de- 
scribed elsewhere (4). 
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Table 1 


Mean Coalescence, Stability, and Effectiveness as a 
Function of Discussion and/or Decision 








Mean 
No 
Dis- Dis- 


Measurement cussion 


cussion Both 


Decision 38 a 30 
No-Decision —.01 14 
Both 34 ll .22 





Coalescence 


Decision - .84 17 
No-Decision ; .92 84 
Both : 88 81 


Stability 


Decision j .06 
No-Decision —.01 ‘ 
Both } 3 05 


Effectiveness 





in mean rho correlation between members’ 
initial rankings of five cities and the average 
correlation of their rankings of these same 
cities finally. The value was positive if mem- 
bers increased in agreement with each other; 
negative if they agreed less finally than ini- 
tially on the average problem. 

The 10 groups permitted discussion ex- 
hibited for all 10 problems a mean coalescence 
of .34 while the 10 groups denied discussion 
exhibited a significantly lower mean (at the 
5 per cent level) of .11 according to a ¢ test. 
Similarly, the 10 groups reaching announced 
decisions had a mean coalescence of .30 sig- 
nificantly higher (at the 5 per cent level) than 
the mean of .14 attained by groups reaching 
no decision. Greatest coalescence (.38) oc- 
curred under a combination of discussion and 
decision, while least coalescence (—.01) ap- 


peared when both were absent. However, ac- 
cording to ¢ tests the mean coalescence of .30 
for discussion alone was not significantly 
higher than the mean of .23 for decision alone. 
Yet, the combination was significantly more 
productive than either alone while either alone 
yielded significantly greater coalescence than 
a complete absence of both. The results are 
summarized in Table 1 while the appropriate 
analyses of variance are displayed in Table 2. 
The standard error of the difference between 
means and the error variance was estimated 
using the mean squares due to groups within 
treatments since the interaction among treat- 
ments was not significant. 

Presence or absence of discussion seemed to 
exhibit slightly more effect (.34 vs. .11) than 
presence or absence of decision (.30 vs. .14). 
This may have been due to the fact that in 
discussion groups not permitted announced 
group decisions, discussion itself gave partial 
information about the opinion of the group as 
a whole. On the other hand, the groups per- 
mitted decisions but no discussion had to ac- 
cept the experimenter’s summary of the secret 
ballot as true before allowing their opinions 
to be influenced. 

Stability. On a single problem involving 
ranking five cities, the stability of en S’s 
opinion was indexed by the correlation of his 
initial and final ranking. As seen in Table 1, 
Ss changed their opinions, on the average, to 
the same degree as they coalesced in opinion 
under the different treatments (the higher the 
tabled correlation, the less change in a mem- 
ber’s decision). Discussion, per se, produced 
significantly more change at the 1 per cent 
level of confidence. Decision did likewise at 


Table 2 


F Ratios Analyzing the Variance of Coalescence, Stability, and Effectiveness 





Source 








Coalescence Stability Effectiveness 





Discussion-No Discussion 
Decision-No Decision 
Interactions of Treatments 
Groups Within Treatments 


Total 





*p < 0S. 
>< 01. 


33.3°* 25.7** 3.4 
26.7** 5.0* 6 
3.9 > 3.0 








Effects of Decision and Discussion 


the 5 per cent level. Again a combination 
yielded the greatest change (.70) while least 
occurred where both decision and discussion 
were absent (.92). - 

Effectiveness. This was the extent the aver- 
age S improved in accuracy. Improvement 
was the gain in correlation of an S’s final 
ranking with the correct ranking of the cities’ 
size (according to the 1950 Census) as com- 
pared with the correlation of his initial rank- 
ing with the correct ranking of cities. While 
the trends in means (Table 1) were in the 
same direction as for coalescence and change, 
the results failed to attain statistical signifi- 
cance using the conservative two-tailed test. 
But this test should be considered as conserva- 
tive since the hypothesis examined was that 
the discussion and decision samples would 
exhibit higher—not merely different—means 
than the control sample. With this in mind, 
the means for the 4 treatments were com- 
pared using one-tailed ¢ tests. Each of the 
three experimental discussion and/or decision 
means of .06, .10, and .06 was significantly 
higher at the 5 per cent level than the mean 
of the control sample (—.01) in which no 
discussion or decision was permitted. Effec- 
tiveness seemed minimized where both decision 
and discussion were absent. 


Summary and Conclusions 


Twenty groups of five subjects per group 
were divided randomly into four treatment 


categories. Each S twice ranked privately 10 
sets of five cities in the order of population 
size. Treatment differences were due to dif- 
ferences in activity intervening between the 
initial and final ranking for each of the 10 
problems. 

Five groups discussed a problem for 3 
min., reaching a group decision announced by 
one of the members. Five groups discussed 
the problem but announced no group decision. 
Five groups engaged in an irrelevant task for 
two minutes and voted secretly on the true 
ranks of the cities, after which the votes were 
counted and the group decision announced. 
Five groups simply continued the irrelevant 
intervening task for,the full three minutes and 
reranked the cities. 
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Three measures examined were: (a) coales- 
cence, or the increase in agreement among 
members ofa group; (0) effectiveness, or the 
difference between initial and final accuracy 
of each member, and (c) stability, or the ex- 
tent each member did not change his opinion. 

The results indicated that: 


1. Coalescence was increased by group dis- 
cussion, group decision and most of all by 
the combination of both treatments. 

2. Change of opinion was significantly 
greater for groups permitted either discussion 
and decision, although the effect was much 
less pronounced with group decision alone. 
Again, greatest change occurred when both 
were permitted. 

3. One-tailed ¢ tests suggested that effec- 
tiveness was greater under decision and/or 
discussion treatments than when neither was 
permitted. 


The present study supports earlier findings 
concerning the efficacy of both group par- 
ticipation and group decision-making. The 
discrepancy with Bennett's (5)' results may 
be a consequence of differences in subject 
matter and criteria. 

The findings are consistent with the assump- 
tion that changes and effectiveness in groups 
primarily result from interaction among mem- 
bers. They also tend to substantiate the 
deduction that clarifying the group decision 
implements the effects of more extended in- 
teraction. 


Received March 10, 1958 


References 


. Anderson, K. A Detroit case study in the group 
talking technique. Personn. J., 1948, 27, 93- 
98. 

. Bass, B. M. Outline of a theory of leadership 
and group behavior. Louisiana State Univer., 
1955. (Tech. Rep. 1, Contract N7ONR 
35609.) 

. Bass, B. M. Development and evaluation of a 
scale for measuring social acquiescence. J. 
abnorm. soc. Psychol., 1956, 53, 296-299. 

. Bass, B. M., Gaier, E. L., Farese, F. J., & Flint, 
A. W. An objective method for studying be- 
havior in groups. Psychol. rep. 1957, 3, 
265-280. 

. Bennett, E. B. The relationship of group dis- 
cussion, decisions, commitment, and consensus 





D. F. Pennington, Jr., F. Haravey, and B. M. Bass 


in “group decision.” Hum. Relat., 1955, 8, 
251-273. 

. Burtt, H. E. Sex differences in the effect of dis- 
cussion. J. exp. Psychol., 1920, 3, 390-395. 

. Coch, L., & French, J. R. P. Overcoming re- 
sistance to change. Hum. Relat., 1948, 1, 512 

$32. 

. Levine, J., & Butler, J. Lecture vs. group de- 
cision in changing behavior. J. appl. Psychol, 
1952, 36, 29-33. 

. Lewin, K. Group decision and social change. In 
T. M. Newcomb & E. L. Hartley (Eds.), 
Readings in social psychology. New York: 
Holt, 1947, pp. 330-344. 

. Lindquist, E. F. The design and analysis of 
experiments in psychology and education. 


(2nd ed.) New York: Houghton Mifflin, 
1953. 


. Marrow, A. J. Group dynamics in industry; im 


plications for guidance and personnel workers 
Occup., 1948, 26, 472-476. 


. McKeachie, W. J. Individual conformity to at- 


titudes of classroom groups. J. abnorm. soc. 
Psychol., 1954, 49, 282-289. 


3. Miller, N. E. Learnable drives and rewards. In 


S. S. Stevens (Ed.), Handbook of experimental 
psychology. New York: Wiley, 1951. P. 468 


. Munsterberg, H. Psychology and social sanity 


Garden City, N. Y.: Doubleday, 1914. 


. Radke, M., & Klisurich, D. Experiments in 


changing food habits. J. Amer. dietetics 
Assoc., 1947, 23, 403-409. 





Journal of Applied Psychology 
Vol. 42, No. 6, 1958 


The Construction and Analysis of a Leadership Behavior 
Rating Form * 


W. W. Rambo? 


Occupational Research Center, Purdue University 


Recently, four studies have been reported 
which have dealt with the factor analysis of 
a series of behavioral statements in an at- 
tempt to describe the various dimensions of 
leader behavior. In each of the four studies 
the authors report two independent factors 
which describe the relationships that were 
noted among the responses to these behavioral 
statements. One factor, which has been called 
Fairness to Subordinates (4), Social Respon- 
sibility to Subordinates and Society (9), and 
Consideration (2, 3), refers to behaviors 
which are generally considered under the title 
of human relations. The second type of fac- 
tor, which has been called Administrative 
Achievement (4), Executive Achievement 
(9), and Initiating Structure (2, 3), has 
been derived from behaviors which serve to 
define and circumscribe the behaviors of 
others. For each of the investigations a dif- 
ferent group of supervisory positions was 
used, viz. military officers (3), industrial 
foremen (2), school administrators (4), and 
industrial executives (9). 

The first phase of this study represents an 
attempt to examine the possibility of general- 
izing these two leadership dimensions to the 
behaviors observed among individuals holding 
first line and middle management positions 
in a group of midwestern industrial organiza- 
tions. In attempting this, relatively simple 
rating and item analysis procedures will be 
used in order to arrive at a rating form which 
will reflect internally consistent and mutually 
independent measures of these two dimensions 
of leader behavior. Apart from offering a re- 
search tool which will supplement existing in- 
struments, it is felt that the ability to con- 
struct such a rating form in a new group of 
industrial organizations will add supportive 


1The author would like to thank C. H. Lawshe 
and J. E. Oliver for their assistance in the various 
phases of this research. 

2 Now at Oklahoma State University. 


evidence to the findings of the above-men- 
tioned research. 

The second phase of this study deals with 
an attempt to describe the leadership behav- 
iors found within several dimensions of the 
formal organizational structure of a large 
manufacturing concern.. The relationship be- 
tween leadership dimension ratings and rank- 
ings on over-all supervisory effectiveness will 
also be presented. 


Method 


The names given the two factors by Halpin and 
Winer (3) were adopted in the present investigation, 
and the following definitions were prepared: Con 
sideration—Behavior, initiated by a person in a po- 
sition of influence, which is motivated by an aware- 
ness of the needs of the subordinates whose actions 
will be affected by this behavior; Initiating Struc- 
ture—Behavior, initiated by a person in a position 
of influence, which serves to define .relationships be- 
tween a subordinate and his superiors, his peers, his 
subordinates, the goals of the group, and the mate- 
rials and equipment used. 

The literature was searched for concepts which 
might logically relate to these two dimensions of 
leadership, and the concepts selected were trans- 
lated into behavioral statements. Seventy-two items 
were prepared, samples of which follow: Considera- 
tion—Does he blame people under him for his own 
failures? During the working day is he friendly in 
his contacts with his employees? Initiating Struc 
ture—Does he let you know that he’s the boss? 
Does he always make sure his people are kept busy ? 

The items were placed in random order, and they 
were given to six judges who were asked to assign 
each item to one of four categories, i.e, Considera- 
tion, Initiating Structure, Both, or Neither. They 
were asked to read the definitions of the two dimen- 
sions of leadership and assign each item to one of 
the four categories which best described the logical 
content of the item. Each judge rated the items in- 
dependently. Items were selected for the initial 
form of the instrument if they were placed in the 
consideration category by four judges or in the 
initiating structure category by four judges. 

The ratings resulted in the elimination of 18 items 
from the original list of 72. Of the 54 remaining 
items, 26 were placed in the consideration dimension 
and 28 were placed in the initiating structure dimen 
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sion. These items were placed in a rating form 
which provided dichotomous, Yes—No, response cate- 
gories. 

This preliminary form was administered to 151 
management men who represented companies in and 
around Lafayette, Indiana. Seventy-eight of the Ss 
held first-line management positions, 51 held second- 
line management positions, and 22 held positions at 
or above third-line management. For the purposes 
of analysis, the last two groups were combined. Us- 
ing the rating form, the Ss were asked to describe 
the behavior of their immediate superior. The Ss 
were instructed not to include their name or the 
name of their superior on the rating form. 

Arbitrary scoring weights of ene and zero were 
used in the evaluation of the ratiizs, with a weight 
of one being given to responses indicative of a high 
degree of consideration or structuring. Each rating 
form, therefore, yielded two scores, one for each 
dimension. 

For each management group a separate internal 
consistency item analysis was performed on each di- 
mension (6). The Lawshe-Baker nomograph (7) 
was used to derive the item statistic, Omega, and 
this statistic was transformed into ¢ values. Items 
were retained if they yielded t values which were 
significant at the .05 level of significance on both 
internal consistency analyses. 

The proposed instrument demands not only a group 
of homogeneous items for each dimension, but that 
the items in one dimension be independent of the 
items in the other. Therefore, the same item analysis 
procedures were carried out, but this time the items 
in each dimension were examined in order to deter- 
mine whether they could discriminate between the 
criterion groups of the second dimension. Hence, 
each item was again subjected to two more analyses, 
one for each management level. Items were re- 
tained if they yielded at least one t value which was 
not significant at the .05 level. This selection cri- 
terion was felt to be stringent enough for this initial 
“screening” analysis since the items would again be 
subjected to a later analysis. 

The items that were selected from the above se- 
ries of analyses were placed into another form of 
the instrument. This form was administered to 197 
management men who held positions in the manu- 
facturing division of a large automobile manufactur- 
ing organization. One hundred thirty-two of the Ss 
held positions of foreman, 47 were general foremen, 
and 18 were assistant superintendents. Two forms 
from the foreman group were incomplete so they 
were eliminated. A series of behavioral descriptions 
were obtained in each of nine units of the manufac- 
turing division of this organization. These descrip- 
tions extended upwards from the foremen to the su- 
perintendents. Hence, using the rating form, fore- 
men described the leadership behaviors of their 
immediate superiors, the general foremen. The gen- 
eral foremen, in turn, described the leader behavior 
of their superiors, the assistant superintendents, and 
the assistant superintendents described the leadership 
behavior of the superintendents. 
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These descriptions reflected the formal organiza- 
tional structure of the company since they referred 
to the formally defined leader. The perceived or in- 
formal leader did not enter into the responses to the 
rating form except to the extent that the formal and 
informal structures coincided. As a result of this 
procedure, each member of the supervisory hierarchy 
above the rank of General Foreman was described 
by all of his immediate subordinates. Since the num- 
ber of foremen within the several units was rather 


. large, a representative sample of foremen was se- 


lected to participate in the study. The rank and file 
employees were not included in the data; therefore, 
there were no descriptions available for the foremen. 
All rating forms could be identified according to de- 
partment and supervisory level with the exception of 
the behavioral descriptions that referred to the su- 
perintendent group. Here, at the request of the Per- 
sonnel Department, these forms could only be identi- 
fied according to supervisory level. 

In order to estimate the perceived effectiveness 
with which each S carried out his managerial func- 
tions, a rating procedure was initiated which reversed 
the direction of the previously described procedure. 
Therefore, superintendents were ranked on over-all 
supervisory effectiveness by the works manager and 
his assistant; all of the assistant superintendents 
within a given unit were ranked by the superintend- 
ent of the units and all of the general foremen in 
each unit were ranked by the superintendent of that 
unit. These ranks did not extend across depart- 
mental boundaries. In other words, the superior in 
a given department ranked only those individuals 
within that department:-who held lower positions on 
the supervisory hierarchy; he did not rank indi- 
viduals in other departments. This resulted in nine 
sets of ranks, one for each department. 

After the sets of ratings were completed, the forms 
were grouped according to department, and primary 
and hold-out groups were determined by randomly 
selecting departments until the N for the hold-out 
group approximated the N for the primary group. 
Ninety-seven Ss were included in the primary group. 
A second item analysis was performed on the items 
in the rating form; however, there was no attempt 
to carry out separate analyses for each management 
level. Therefore, following basically the same pro- 
cedure that was previously employed, the items were 
examined for internal consistency. Items were re- 
tained that exhibited ¢ values significant at the .05 
level of significance. 

Next, each item was examined for independence 
from the second dimension. Items which did not 
display significant discriminability were retained. 

With the completion of the analysis each item had 
undergone six separate item analyses, three for in- 
ternal consistency and three for independence. 


Results 


The first series of item analyses resulted in 
the loss of 14 of the 54 items which had been 
included in the first form of the instrument. 
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The final series of item analyses yielded an 
instrument which was composed of 29 items, 
12 items measuring behavior described by the 
initiating structure dimension and 17 items 
yielding a measure of the consideration di- 
mension.° 

The forms in the hold-out group were 
scored using the scoring key which emerged 
from the final series of item analyses, and a 
split-half reliability coefficient was computed 
for each dimension. The assignment of items 
to the halves was determined randomly. A 
coefficient of .73 was obtained for the con- 
sideration dimension. Stepping up this co- 
efficient by the Spearman-Brown technique 
yielded an estimated coefficient of .84 for a 
form twice the length of one of the halves. 

The split-half reliability estimate obtained 
from the items in the initiating structure di- 
mension was .79. Stepping up this value re- 
sulted in an estimate of .88. 

A coefficient of interrater agreement (5) 
was computed for each dimension in order to 
estimate the consistency with which raters 
describe the behavior of a given ratee. This 
statistic permits an estimation of the reli- 
ability of an instrument in situations in which 
there is an unequal number of observations 
recorded for the various individuals on whom 
the instrument is applied. Coefficients of .73 
and .57 were obtained for the consideration 
and structure dimensions, respectively. 

In order to arrive at an estimate of the de- 
gree of relationship existing between the two 
dimensions, a Pearson r was computed from 
the dimension total scores. A correlation of 
.02 was obtained. This r was not significant 
at the .05 level of significance. 

Since the foregoing analysis was concerned 
with the internal characteristics of the instru- 
ment, the second phase of the study was car- 
ried out on the combined primary and hold- 
out groups. 

One of the purposes of this study was to 
determine whether or not there are differences 

8A 4-page table giving items and item statistics 
from the final series of item analyses has been de- 
posited with the American Documentation Institute. 
Order Document No. 5770, from ADI Auxiliary Pub- 
lications Project, Photoduplication Service, Library 
of Congress, Washington 25, D. C., remitting in ad- 
vance $1.25 for microfilm or $1.25 for photocopies. 


Make checks payable to Chief, Photoduplication 
Service, Library of Congress. 


Table 1 


Analysis of Variance for Consideration Dimension 
Scores Classified with Respect to the Hori- 
zontal and Vertical Organizational Axes 





Mean 
Square 


Source af 





1.22 
20.38 
10.38 

9.52 


Levels 
Departments 
DXL 

Error 


Total 


in the leadership behaviors that are observed 
along several dimensions of the formal or- 
ganizational structure. A two-way classifica- 
tion analysis of variance was performed in 
order to arrive at some estimate of these for- 
mal dimension effects. The horizontal and 
vertical axis of the company’s organization 
chart formed the two axes of the analysis of 
variance. However, there was one limitation 
imposed on this analysis because it was im- 
possible to identify the department super- 
vised by each superintendent. This impelled 
the exclusion of this group from the present 
analysis. Tables 1 and 2 present the results 
of the analysis of the scores obtained from 
the two dimensions. 

Examination of Table 1 will demonstrate 
that the F for supervisory levels did not at- 
tain significance at the .05 level. However, 
the test for differences between department 
consideration scores did result in an F value 
which was‘ significant at the .05 level. This 
indicates that as one moves across depart- 
ments, the behaviors which are described by 
the consideration dimension demonstrate sig- 
nificant fluctuations. Therefore, it may be 
said that this aspect of supervision is not con- 
sistent among the several horizontal units of 
the formal organizational structure. 

Table 1 also indicates that the F value for 
interaction between departments and super- 
visory level did not reach the value required 
for the .05 level. Therefore, the analysis was 
not able to demonstrate a differential effect 
of departmental factors as a function of the 
particular level of supervision observed. In 
terms of group averages, it appears that mem- 





Table 2 


Analysis of Variance for Structure Dimension Scores 
Classified with Respect to the Horizontal 
and Vertical Organizational Axes 


4 
A 





Table 3 


Analysis of Variance for Consideration Dimension 
Scores Classified with Respect to Leader- 
ship Patterns and Rank Category 








Mean 


Source df Square 


Mean 


Source df Square F 





1.26 
7.70 
5.74 
3.38 


Levels 1 
Departments 8 
DXL 8 
Error 159 


176 


* Fos = 1.98 (8, 159 df). 


bers within any one department were func- 
tioning under the same degree of considerate- 
ness. 

The results of a similar analysis on the 
‘structuring dimension scores are presented in 
Table 2. Here it can be seen that the results 
parallel those reported for the preceeding 
analysis of the consideration dimension. 

Since it was desirable to determine whether 
there was a significant difference between the 
mean dimension scores obtained from the su- 
perintendent level and the two levels which 
have just been analyzed, and since this last 
analysis did not indicate a significant differ- 
ence between the two levels, mean scores for 
both dimensions were computed from the pool 
of these two levels. A ¢ test was performed 
in order to determine the significance of the 
difference between these pooled means and 
the mean dimension scores of the superin- 
tendent level. For both levels the #’s were 
not significant at the .05 level. 

For each dimension a mean score was com- 
puted from the several descriptions obtained 
of the leadership behavior of each assistant 
superintendent, and these means were ar- 
ranged according to order of magnitude. The 
two resulting continua were employed to form 
the two axes of a scattergram. Each axis was 
dichotomized at the median, thus yielding a 
four-cell table which reflected four patterns 
of leadership behavior. Under each pattern 
was placed the average. scores obtained on a 
dimension by the general foreman who served 
under these four patterns of leadership. The 
four groups of general foremen were further 





<1.00 
1.17 
<1.00 


8.54 
13.04 

8.20 
11.13 


Pattern 3 
Rank Category 2 
PXR 6 
Error 118 


Total 129 


classified according to their rankings on over- 
all supervisory effectiveness. Three classifi- 
cations were used. In departments which 
had six or more general foremen, the top and 
bottom two ranks were assigned to the “good”’ 
and “poor” classifications, and the remaining 
Ss were placed in the “average”’ classification. 
For departments which had fewer than six 
general foremen the top and bottom extreme 
rank was assigned to the good and poor cate- 
gories, and the remaining ranks were assigned 
to the middle category. 

For both dimensions, a two-way analysis of 
variance was performed for an R xX C table 
with disproportionate subclass numbers (10). 
In the event of a significant F ratio for pat- 
terns on either dimension, orthogonal com- 
parisons were planned. 

Table 3 presents the results of the analysis 
of variance of the consideration dimension. 
Here it can be seen that the F values for all 
main effects and interactions were not signifi- 


Table 4 


Analysis of Variance for Structure Dimension Scores 
Classified with Respect to Leadership 
Patterns and Rank Category 





Mean 


Square 


15.67 
5.47 
6.20 
3.39 


Source df 





Pattern 3 
Rank Category 2 
PXR 6 
Error 


Total 





* Fn = 3.95 (3, 118 df). 
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Table 5 


Orthogonal Comparisons for Structure 
Dimension Analysis 


Mean 


Contrast dj Square F 


Hi Structure vs. Lo Structure 1 
Hi Structure-Hi Consideration 
vs. Hi Structure-Lo Consid- 
eration 1 
Error 118 


27.99 8.26* 


2.94 <1.00 
3.39 


Total 120 


*F oo = 6.86 (1, 118 df 

cant at the .05 level. Table 4 presents the 
results of a similar analysis of the structur- 
ing dimension scores. Here it can be seen 
that the F values for rank categories and in- 
teraction did not attain significance, but the 
value for patterns was significant at the .O1 
level of significance. This led to the analysis 
of the two meaningful orthogonal comparisons. 

Table 5 presents the results of these com- 
parisons. It can be seen that the F for the 
comparison of the groups which included the 
high structure leadership patterns as com- 
pared with the general foremen serving under 
a “low” degree of structure was significant at 
the .01 level. This result indicates that as- 
sistant superintendents who were. high in 
structure tended to have general foremen 
serving under them who were also high in 
structuring behavior. Assistant superintend- 
ents who were low in structure had general 
foremen serving under them who were also 
low in the amount of structure they afforded 
their subordinates. 

The second comparison presented in the 
Table 5 indicates that the degree to which 
assistant superintendents who are high in 
structure manifest behavior that is described 
by the upper or lower segments of the con- 
sideration continuum does not influence the 
descriptions of the structuring behavior which 
was obtained from the ratings of the general 
foremen group. 


Discussion 


The construction of the rating form was 
based on the assumption that the importance 
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of two dimensions of leadership could be gen- 
eralized beyond the situations in which the 
defining factor analyses were performed. The 
judge’s ratings of the initial list of items 
yielded two groups of items which were 
judged to be compatible with one of the defi- 
nitions of the two dimensions. The item 
analysis techniques further “purified” these 
two groups by yielding two homogeneous 
groups of items which were not significantly 
related to each other. 

It is felt that the results of the first phase 
of this study offer support to the four previ- 
ously mentioned studies that used factor 
analysis techniques to describe leader behav- 
ior. Factor analysis is primarily a descrip- 
tive device which does not permit the estab- 
lishment of confidence intervals for rotated 
factor loadings. Therefore, studies which 
demonstrate the emergence or utility of a 
particular factor definition in a new situa- 
tion will lend evidence which will support at- 
tempts to generalize this dimension beyond 
the situation in which the original factor 
analysis was performed. 

Likert and Katz (8) have reported experi- 
mental results which tend to agree with a 
two dimensional concept of industrial leader- 
ship behavior. These authors have identified 
two types of supervisory behavior, behavior 
that is essentially “employee centered” and 
behavior that is “production centered.” These 
two classifications of behavior were obtained 
from a series of survey and interview pro- 
cedures, and an examination of the behaviors 
which have been related to these two super- 
visory “types” will reveal a close similarity 
to the two dimensions of leadership which 
are the topic of this investigation. It seems, 
therefore, that there is considerable evidence 
in the literature which tends to support a 
two-dimensional conception of leadership be- 
havior. 

The analysis of the data which was ob- 
tained from the several supervisory levels, the 
vertical axis of the formal organization, indi- 
cates that there were no significant leader be- 
havior differences existing between these lev- 
els. Since this axis refers to the authority 
hierarchy found in the company, it may be 
said that these two dimensions of leadership 
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do not significantly vary with varying degrees 
of authority found at the three levels ob- 
served. 

This vertical axis also reflects a scale of in- 
creasing responsibility within the organiza- 
tion. The responsibilities which have been 
formally defined for any one particular level 
may be thought of as a generalization of the 
responsibilities found at the lower levels of 
this axis. It is a generalization of the more 
or less specific responsibilities found at these 
lower levels. The results of this analysis in- 
dicate that increasing scope of formal re- 
sponsibility is not accompanied by significant 
changes in these two dimensions of leader 
behavior. 

The analysis performed on the horizontal 
or departmental axis indicated significant de- 
partmental differences in the scores on the 
two dimensions. Hence, as one moves from 


one department to the next he moves across 
different leadership behaviors. It must be re- 
membered that this analysis was performed 
using only two levels of supervision; the in- 
terpretation of these findings is made within 
this limitation. 

The significance of the above findings 


should be interpreted in the light of the non- 
significant interaction between departments 
and supervisory levels. This nonsignificant 
interaction term indicates that the depart- 
mental differences observed were not depend- 
ent upon the particular level of supervision 
that was involved in a given comparison. 
Therefore, even though significant depart- 
mental differences exist, these differences were 
rather consistent for the two levels within the 
departments. This intradepartmental consist- 
ency might well be explained by the leader- 
ship climate concept that has been posited by 
the Ohio State investigators (1, 2). Within 
each department there seems to exist a type 
of supervisory leadership that is found at 
each level of supervision. At first glance it 
would seem that the superior-subordinate re- 
lationships establish a “climate” in which 
rather consistent or similar forms of leader 
behavior emerge. This might be a function 
of the dependent relationship which exists be- 
tween the industrial supervisor and his sub- 
ordinates. However, the analysis of the leader 


W. W. Rambo 


behavior found under the four different pat- 
terns of leadership does not completely bear 
out this explanation, at least within the frame- 
work of the formal organizational structure. 
It will be recalled that on only the struc- 
turing dimension was there some evidence 
of superior—subordinate behavior similarities. 
Here it was found that assistant superintend- 
ents who fell into the high structure group 
tended to have subordinates who were also 
high on this dimension score. The Ohio State 
studies found this relationship on both di- 
mensions. 

The most apparent explanation for the dis- 
crepancies between these two studies is the 
fact the Ohio State study used the informal 
leadership relationships while this study was 
concerned with the formal structure. This 
lack of agreement between the two studies 
might offer some insight into the nature of 
the factors which result in this climate. Basi- 
cally, the behaviors that compose the initiat- 
ing structure dimension relate to a given su- 
perior’s expectations. That is, they relate to 
what he expects of his subordinates. Hence, 
due to the dependent relationship existing be- 
tween the two, in order for these expectations 
to be fulfilled the subordinate must reflect 
this structure down to his own subordinates. 
For example, if the superior requires that his 
subordinates report to him concerning the 
progress of the work, the subordinate, in or- 
der to comply: with this requirement, must 
expect the same from his subordinates. Since 
the dependent relationship existing between 
the two is primarily defined by the formal or- 
ganization chart (it is the formally defined 
superior who generally evaluates a subordi- 
nate and plays an important role in deter- 
mining pay raises and promotions), it can be 
expected that these behavioral similarities 
will be reflected in the formal superior—sub- 
ordinate relationships. However, unless the 
superior actually inspects or audits subordi- 
nate supervisory behavior with respect to 
considerateness, this leadership similarity 
would not be expected in the formal struc- 
ture. The present results seem to indicate 
that considerate behavior is not a function of 
the formal superior-subordinate interactions. 

The nonsignificant F values obtained from 
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the analysis of the two dimensions of leader 
behavior with respect to the three classifica- 
tions of supervisory effectiveness present some 
information which concerns the evaluation of 
the general foremen group. The ranks can 
be thought of as a representation of the 
evaluational perceptions of the superintend- 
ent group. The scores obtained by the gen- 
eral foremen on the two dimensions represent 
a description of the behavior that is directed 
at their subordinates, the foremen. The re- 
sults indicate that the estimates of managerial 
effectiveness are not related to supervisory 
behavior as described by those actually su- 
pervised. Hence it is probable that the inter- 
action of the superior and his subordinates 
plays a more significant role in this evalua- 
tion than the relation between the subordi- 
nate and his own subordinates. 


Summary 


A study has been reported the first phase 
of which deals with the construction of a rat- 
ing form which purportedly reflects two di- 
mensions of leadership behavior. These two 
dimensions called Consideration and Initiat- 
ing Structure were derived from research re- 


ported by Halpin and Winer (3), Hobson 
(4), Rupe (9) and Fleischman et al. (2). 
Item analysis procedures were employed in 
an attempt to obtain a series of behavioral 
statements which were internally consistent 
within a given dimension and yet were inde- 
pendent of the statements in the second di- 


mension. A rating procedure which required 
_a “logical analysis” of the statement content 
was employed to aid in the interpretation of 
the items which survived the item analysis. 

Stepped-up split-half reliability coefficients 
of .84 and .88 were obtained for the considera- 
tion and initiating structure dimensions, re- 
spectively. 
lated .02 which was not significant at the .05 
level of significance. 


The two dimensions intercorre- ° 


415 


The second phase of the study deals with 
the analysis of the scores obtained from the 
above instrument in relation to the formal 
organizational structure of a large manufac- 
turing concern. Significant behavioral varia- 
tions were observed along the horizontal axis 
of the company, but not up the vertical axis. 
Some evidence is presented which supports 
the leadership climate concept. 

The results indicate that there is no rela- 
tionship existing between scores on the two 
dimensions and rankings of over-all super- 
visory effectiveness. 


Received March 14, 1958. 
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Interdependence of Successive Absolute Judgments 


Warren W. Willingham ' 
U.S. Naval School of Aviation Medicine 


The simple numerical scale is one of the 
most widely used rating methods. The ob- 
server judges each stimulus independently and 
assigns a numerical scale value to it. Conse- 
quently, this method falls into the broad psy- 
chophysical category of absolute judgment. 
Ideally we would prefer that the individual 
judgments be truly independent. However, 
we are well aware that the judgments will be 
determined to a great extent by the over-all 
stimulus context, and a true absolute scale 
does not exist. On the other hand, when we 
are dealing with a single group or class of 
stimuli all being judged in the same context, 
we are not concerned with this type of rela- 
tivity and we assume that for our purposes, 
the judgments are independent. It will be 
noted that this assumption implies that there 
are no successive response biases. It implies 
that the order of stimulus presentation is of 
no consequence. 

The results of a recent study by Garner 
(1) cast considerable doubt on this assump- 
tion. Garner’s observers made successive ab- 
solute judgments on the loudness level of 
tones. Among other things it was found that 
a middle range stimulus would tend to be 
judged loud if the preceding stimulus was 
loud and weak if the preceding stimulus was 
weak. Two possible explanations for this 
bias are as follows. First, it may be due to 
a simple response bias similar to number 
guessing habits. Secondly, the bias may be 
a sensory phenomenon in that the observer 
may be responding to a summation of the im- 
mediate stimulus and the aftereffect of the 
preceding stimulus. If this latter explanation 
is correct, we would expect the bias only in 
judgments which are predominantly sensory 
in nature. Whereas, if the biases are re- 
sponse habits, we might find such biases in 
any judgment, be it of fact, of value, or 


1 Opinions expressed here are those of the author. 
They are not to be construed as necessarily reflecting 
the views or endorsement of the Navy Department. 


whatnot. It is these more cognitive judg- 
ments which typify most rating situations. 
It was the purpose of this experiment to 
determine whether successive absolute judg- 
ments of a cognitive nature are interdepend- 
ent, and, if so, to evaluate a method for con- 
trolling this bias. 


Method 


The rating task which was selected involved rat- 
ing the populations of countries. At its simplest the 
design required that two independent groups of Ss 
rate the population of some “test” country; one 
group having just rated a sparsely populated coun- 
try, the other group having just rated a populous 
country. For example, one group rates Canada after 
Panama and another group rates Canada after China 
If an immediately preceding rating biases a subse- 
quent rating, this bias should be reflected in diver- 
gent mean ratings of Canada by the two groups. 
This type of design was incorporated several times 
in longer lists of words as illustrated in Fig. 1. List 
A was administered to one group of Ss and List B 
to another group. Test Item a, Canada, was in 
fourth position on both lists. Siam was Test Item 5, 
Burma was Test Item c, and so forth. There were 
eight such test items incorporated in the complete 
list of 26 countries. 

Six groups, each of about 65 Naval Aviation 
Cadets, served as Ss. The total N was 387. Three 
groups rated List A using a 5, 9, and 20 point scale, 


Sequential 
position List A 
Holland 
Greece 
Panama 
Canada 
Rumania 
China 
Siam 
Argentina 
Iceland 
Burma 


List B 





Holland 
Greece 
China 
Canada 
Rumania 
Panama 
Siam 
Argentina 
India 
Burma 


(test item a) 


(test item 5) 


coeoeonmnawre ONE 


— 


(test item c) 


Brazil Brazil 





Experimental lists employed (each with 
a 5, 9, and 20-point scale). 
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Table 1 


Differences Between the Mean Ratings of the Test Items 
as Rated on Lists A and B (Mean Rating Following 
a Populous Country Minus Mean Rating Fol 
lowing a Sparsely Populated Country) 


Revised 
20-Point 
Scale 


20-Point 
Scale 


9-Point 
Scale 


5-Point 
Scale 


Test 
Item 


0.1 0.0 0.6 —().7 
0.0 0.6 0.9 0.6 
0.1 0.6 13 0.1 
01 0.7* 1.3 0.2 
O1 0.1 2.0** 0.6 


—0.1 0.4 1.1 1.2 
{ —0.0 —0.1 os 1.0 
h 0.2 0.4 


1.9** 0.3 


Aver. X, 0.03 0.35** - 1.08* 0.01 


*p = .05. 
> = Ol. 


respectively. Three groups rated List B using a 5, 
9, and 20-point scale, respectively. The idea of rat- 
ing the population of each country on an » point 
scale was briefly explained. Then the Ss were sim- 
ply told to go down the list and rate each country 
according to where it should stand in relation to all 
countries in the world. 


Results 


The results obtained for the eight test items 
are shown in the first three columns of Table 1. 
The entries in this table are differences be- 
tween the mean ratings of the paired test 
items in Lists A and B. Thus, the first en- 
try in Table 1 (— 0.1) is the difference ob- 
tained by subtracting the mean rating for 
“Canada following Panama” (List A) from 
the mean rating for “Canada following China” 
(List B). The mean rating which followed a 
sparsely populated country was always sub- 
tracted from the mean rating which followed 
a populous country. The first column of 
Table 1 indicates that the five-point scale 
showed no bias effect. In the case of the nine- 
point scale the differences are generally posi- 
tive, which indicates a tendency to rate the 
test item in the direction of the previous rat- 
ing. Using a two-tailed ¢ test with seven de- 
grees of freedom, the over-all effect is associ- 
ated with the .01 level of significance. The 
20-point scale shows an even larger effect in 


417 


the same direction. These results agree with 
those of Garner in showing a shift of the sub- 
jective scale away {rom the previous rating. 
The data are also in agreement with those of 
Garner in showing an increasing effect as the 
number of response categories increases. 

Two additional groups of Ss were tested 
to determine whether revised directions could 
mitigate this bias. Groups of 62 and 66 Ss 
rated the countries with Lists A and B re- 
spectively using a 20-point scale. The experi- 
mental conditions remained constant except 
for one additional instruction. After the 
regular instructions the experimenter com- 
mented that the Ss could probably do a bet- 
ter job if they would rate the high ones first, 
the low ones next, and then the ones in the 
middle. The fourth column of Table 1 shows 
the results under this condition. None of the 
individual items showed a significant differ- 
ence between the two forms, and the over-all 
effect is close to zero. 


Discussion 


As we have previously mentioned, the re- 
sults of this study agree closely with results 
obtained by Garner. That writer had sug- 
gested that the effect might be due to either 
a judgmental bias or a sensory phenomenon. 
Since the judgment involved in this study is 
essentially a question of factual knowledge, 
it is very doubtful that the bias is sensory in 
nature. 

On the surface it would seem that this bias 
is closely related to the anchoring effect in the 
framework of adaptation level theory. How- 
ever, this similarity is more superficial than 
real. It will be remembered that the effect of 
introducing an anchor stimulus before each 
stimulus to be judged is to extend the psy- 
chological scale toward the anchor stimulus. 
That is, numerical ratings tend to be smaller 
after a large anchor and larger after a small 
anchor. Thus, the effect is exactly opposite 
to the effect we have obtained. The same is 
true if we consider the over-all context effect. 
A stimulus will be judged small in a context 
of large stimuli and large in a context of small 
stimuli. It appears likely that the bias dis- 
cussed here operates independently of the 
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contextual effects handled by adaptation level 
theory. 


Summary 


One of several different forms of a rating 
sheet was administered to 515 Ss. The re- 
sults indicated that ratings tend to be biased 
in the direction of the previous rating, and 
that the bias increases as the number of re- 
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sponse categories increases. No bias effect 
was found when the Ss were instructed to 
rate the extreme stimuli first. 


Received March 17, 1958. 
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Identification of Cola Beverages: V. A Visual Check’ 
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The present study grew naturally out of 
four earlier investigations (1, 2, 3, 4) on the 
identification of cola beverages by taste. In 
all of these studies, Coca Cola, Pepsi Cola, 
and R. C. Cola were used. In one experi- 
ment, a fourth, relatively unknown cola 
brand was added. The over-all results in 
the series showed that the same responses 
were “emitted” whether three different bev- 
erages or four different ones were given, 
whether there were three of the same brand 
or four of the same brand, whether they were 
actually the leading brands or practically un- 
known ones and whether Ss were told they 
would sample Coca Cola, Pepsi Cola, R. C. 
Cola, and were actually given these beverages, 
or when they were not told what the cola 
drinks were and they were the three leading 
‘brands, or whether they were told that they 
would be given the leading brands and were 
given some unknown cola or colas. 

Diverse conditions, then, elicited a tedious 
sameness of response and the 645 Ss observed 
to date appeared to respond obsessively with 
the easily triggered phrase, ‘Coca Cola, Pepsi 
Cola, R. C. Cola.” 

We, therefore, speculated about the pos- 
sible pattern experimental Ss might give if 
they were shown visual cola stimuli presented 
tachistoscopically. What would happen if 
the various cola bottles, bottle caps, and 


brand names were shown at fast exposures? . 


Would there be any relationship between the 
visual discriminations of a group of Ss and 
the gustatory discriminations of the Ss of 
our earlier studies? These questions permit 
answering the hypothesis previously suggested 
in Studies II and III regarding the probable 
effectiveness of advertising and the prevalence 
of cola stimuli in the cultural media, with 


1 Grateful acknowledgment is made to Margaret 
Habein, Dean of Liberal Arts, for her generous sup- 
port of this study, to James Rutherford for his help 
in collecting the data, and to J. F. McGovern for 
facilitating the tabulation and processing of the data 
onto IBM cards. 


readiness-to-respond with a “signal reaction”’ 
incorporating the three “leading” brand 
names. These hypotheses would be sup- 
ported if the visual dice are loaded the same 
way as the gustatory ones were. 


Procedure 


The Ss of the present study consisted of 
210 students (6 groups of 35 each) from the 
Elementary Psychology courses. After they 
were seated in a Visual Aids screening room, 
they were each given a Data Sheet and were 
asked to fill it out, giving name, sex, and other 
information that was thought to be relevant. 
The room was then darkened but not enough 
to prevent further recording of responses on 
the sheet and the following instructions were 
then read: 

“A series of slides will be flashed on the 


“screen at the front of the room. They will 


appear and disappear very quickly. We 
would like your cooperation in trying to see 
what appears there for a fraction of a second 
and in recording what you see. You will 
have just time enough to record your re- 
sponse and to get ready for the next slide 
which will be flashed immediately after you 
hear a ‘Ready’ signal. Naturally, we are 
interested only in your own reaction. This 
is not a test of intelligence and there is no 
right or wrong answer. Whatever you see is 
correct, so please record only what you see 
without regard to what you might see others 
write. Please do not talk to anyone about 
this study either during the experiment or 
afterwards.” 

Prepared 35 mm. color slides were then 
projected on a screen for 1/400 sec. at ap- 
proximately 15-second intervals. There was 
a total of 45 slides in three stimulus cate- 
gories, 15 of them with color reproductions 
of bottles, 15 with color reproductions of 
bottle caps, and 15 were black and white 
slides containing typewritten brand names, 
Coca Cola, Pepsi Cola, and R. C. Cola (sic). 
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Identification of Cola Beverages: V. A Visual Check 


Bottle, cap, and brand name slides were each 
subdivided as follows: Each stimulus cate- 
gory had three of the stimuli presented singly 
while six of them were shown paired in coun- 
terbalanced order and six of them were re- 
produced as triples in counterbalanced order 
so as to control for possible position effects. 
The bottles and caps were presented side by 
side while the typewritten brand names were 
presented in vertical pairs or triples. Pres- 
entation of bottles, caps, and brand names 
was also given in a counterbalanced order to 
the six groups so that each stimulus category 
appeared in a first, second, or third order the 
same number of times. However, the posi- 
tion of each slide within the category was 
the same for all groups. It took about 35 
minutes to run a group of Ss. 


Results and Discussion 


Table 1 indicates that for Bottle Stimuli 
(singles and pairs) Coca Cola is most accu- 
rately identified. The reader will note that 
Coca Cola was correctly identified 689 times 
compared with Pepsi Cola which was cor- 
rectly identified 203 times and R. C. Cola 
only 59 times. It is also apparent that both 
Pepsi and R. C. were misidentified as Coca 
Cola more frequently (507 and 680 times re- 
spectively) by contrast with Coca Cola’s mis- 
identification as Pepsi Cola only 83 times and 
R. C. Cola 54 times. R. C. Cola trails far 
behind in these respects. 

The data for bottle caps indicate an advan- 
tage for Pepsi Cola when presented both 
singly and paired with one of the other 
brands since the totals of correct identifica- 
tion for Pepsi are 578, for, Coca Cola 355, 
and for R. C. Cola 206. It appears that 
bottles are favored in the order: Coke, Pepsi, 
and R. C.; and caps, Pepsi, Coke and R. C. 
These results mean that, with respect to the 
competing brands, the Coke bottle and the 
Pepsi cap are leaders in each of their cate- 
gories. Consistent with this, it is interesting 
to note that when the Coke cap is misidenti- 
fied, it is misidentified as Pepsi more fre- 
quently (190 times) than Pepsi is misidenti- 
fied as Coke (81 times). For the sum of 
single and paired presentations, it is more fre- 
quently misidentified as Pepsi (161) than as 
Coke (133). 
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The data for the slides of the typewritten 
brand names are somewhat different from 
those for bottles and caps. They differ in 
that typed brands presented singly were such 
an easy perceptual task that practically no 
errors appeared. The greater ease of identi- 
fying the typewritten brand names is also 
evident when these materials were shown 
paired in competition with one another. 
However, sufficient errors were made here to 
indicate that under this condition R. C. has 
a greater advantage. For both single and 
paired presentation, R. C. yields 909 correct 
identifications followed by Coca Cola’s 882 
and Pepsi’s 818. These results may indicate 
that our use of the abbreviated form of the 
Royal Crown name gave it a perceptual ad- 
vantage over the other two brands under our 
conditions. 

The tabular data show differential advan- 
tages for the three brands tested depending 
on which stimulus aspect was presented. 
However, when the total of all stimuli are 
considered, then the order of frequency of use 
of the brand names is Coca Cola, Pepsi Cola, 
R. C. Cola. The results here are consistent 
with the findings of our previous studies. 
Though there is some tendency toward inter- 
action based upon the nature of our stimulus 
presentations, in general the hypothesis that 
the brand identifications reflect a “readiness- 
to-respond” dependent upon advertising ap- 
pears to be supported. The results of 
Prothro’s (5) study also support this inter- 
pretation. The results are also suggestive of 
other well known readiness-to-respond reac- 
tions which were related to early and mas- 
sive advertising of such brand name products 
as Victrola and Frigidaire. 

The present experiment is related to the 
currently popular area of subliminal percep- 
tion and suggests that lesser known brands 
presented by rapid exposure may readily be 
misidentified as products with more familiar 
brand names. 


Summary 


A group of 210 Ss was asked to identify 45 
tachistoscopic slides presented at 1/400 sec. 
exposure. The slides contained colored trans- 
parencies of Coca Cola, Pepsi Cola and R. C. 
Cola (a) bottles, (5) bottle caps, and (c) 
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typewritten brand names. All three cate- 
gories of stimuli were presented either singly, 
in pairs, or as triples, the latter two in coun- 
terbalanced order. None of the triples data 
was analyzed for this report. The over-all 
results indicate that our Ss showed a greater 
accuracy in responding to the brand names 
Coca Cola, Pepsi Cola, and R. C. Cola in 
that order. They also reveal that when mis- 
identifications were made, they occurred in 
the same sequence. When the categories 
were analyzed separately, it was found that 
the orders of decreasing frequency for accu- 
rate identifications and for misidentifications 
were as follows: bottles—Coca Cola, Pepsi 
Cola, R. C. Cola; bottle caps—Pepsi Cola, 
Coca Cola, R. C. Cola; typewritten brand 
name—R. C. Cola, Coca Cola, Pepsi Cola. 
These findings were related to previous cola 
studies and are believed to support the hy- 
pothesis that identification of cola beverages 


3. Pronko, N. H., & Bowles, J. W., Jr. 


5. Prothro, E. T. 


G. Y. Kenyon and N. H. Pronko 


is more related to the extent and specific na- 
ture of advertising than to taste, giving cer- 
tain brands under certain stimulus conditions 
a favored position in regard te a readiness- 
to-respond with a particular brand name. 
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